Thread: Select max(foo) and select count(*) optimization
Speaking of special cases (well, I was on the admin list) there are two kinds that would really benefit from some attention. 1. The query "select max(foo) from bar" where the column foo has an index. Aren't indexes ordered? If not, an "ordered index" would be useful in this situation so that this query, rather than doing a sequential scan of the whole table, would just "ask the index" for the max value and return nearly instantly. 2. The query "select count(*) from bar" Surely the total number of rows in a table is kept somewhere convenient. If not, it would be nice if it could be :) Again, rather than doing a sequential scan of the entire table, this type of query could return instantly. I believe MySQL does both of these optimizations (which are probably a lot easier in that product, given its data storage system). These were the first areas where I noticed a big performance difference between MySQL and Postgres. Especially with very large tables, hearing the disks grind as Postgres scans every single row in order to determine the number of rows in a table or the max value of a column (even a primary key created from a sequence) is pretty painful. If the implementation is not too horrendous, this is an area where an orders-of-magnitude performance increase can be had. -John
> Especially with very large tables, hearing the disks grind as Postgres scans > every single row in order to determine the number of rows in a table or the > max value of a column (even a primary key created from a sequence) is pretty > painful. If the implementation is not too horrendous, this is an area where > an orders-of-magnitude performance increase can be had. Actually, it's very painful. For MySQL, they've accepted the concurrancy hit in order to accomplish it -- PostgreSQL would require a more subtle approach. Anyway, with Rules you can force this: ON INSERT UPDATE counter SET tablecount = tablecount + 1; ON DELETE UPDATE counter SET tablecount = tablecount - 1; You need to create a table "counter" with a single row that will keep track of the number of rows in the table. Just remember, you've now serialized all writes to the table, but in your situation it may be worth while. max(foo) optimizations requires an extension to the aggregates system. It will likely happen within a few releases. A work around can be accomplished today through the use of LIMIT and ORDER BY.
On 1/5/04 2:52 PM, Rod Taylor wrote: > max(foo) optimizations requires an extension to the aggregates system. > It will likely happen within a few releases. Looking forward to it. > A work around can be accomplished today through the use of LIMIT and ORDER BY. Wowzers, I never imagined that that'd be so much faster. Thanks! :) -John
John Siracusa <siracusa@mindspring.com> writes: > 1. The query "select max(foo) from bar" where the column foo has an index. > Aren't indexes ordered? If not, an "ordered index" would be useful in this > situation so that this query, rather than doing a sequential scan of the > whole table, would just "ask the index" for the max value and return nearly > instantly. http://www.postgresql.org/docs/current/static/functions-aggregate.html -Neil
Oops! siracusa@mindspring.com (John Siracusa) was seen spray-painting on a wall: > Speaking of special cases (well, I was on the admin list) there are two > kinds that would really benefit from some attention. > > 1. The query "select max(foo) from bar" where the column foo has an > index. Aren't indexes ordered? If not, an "ordered index" would be > useful in this situation so that this query, rather than doing a > sequential scan of the whole table, would just "ask the index" for > the max value and return nearly instantly. > > 2. The query "select count(*) from bar" Surely the total number of > rows in a table is kept somewhere convenient. If not, it would be > nice if it could be :) Again, rather than doing a sequential scan of > the entire table, this type of query could return instantly. > > I believe MySQL does both of these optimizations (which are probably > a lot easier in that product, given its data storage system). These > were the first areas where I noticed a big performance difference > between MySQL and Postgres. > > Especially with very large tables, hearing the disks grind as > Postgres scans every single row in order to determine the number of > rows in a table or the max value of a column (even a primary key > created from a sequence) is pretty painful. If the implementation > is not too horrendous, this is an area where an orders-of-magnitude > performance increase can be had. These are both VERY frequently asked questions. In the case of question #1, the optimization you suggest could be accomplished via some Small Matter Of Programming. None of the people that have wanted the optimization have, however, offered to actually DO the programming. In the case of #2, the answer is "surely NOT." In MVCC databases, that information CANNOT be stored anywhere convenient because queries requested by transactions started at different points in time must get different answers. I think we need to add these questions and their answers to the FAQ so that the answer can be "See FAQ Item #17" rather than people having to gratuitously explain it over and over and over again. -- (reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc")) http://www.ntlug.org/~cbbrowne/finances.html Rules of the Evil Overlord #127. "Prison guards will have their own cantina featuring a wide variety of tasty treats that will deliver snacks to the guards while on duty. The guards will also be informed that accepting food or drink from any other source will result in execution." <http://www.eviloverlord.com/>
Not that I'm offering to do the porgramming mind you, :) but . . In the case of select count(*), one optimization is to do a scan of the primary key, not the table itself, if the table has a primary key. In a certain commercial, lesser database, this is called an "index fast full scan". It would be important to scan the index in physical order (sequential physical IO) and not in key order (random physical IO) I'm guessing the payoff as well as real-world-utility of a max(xxx) optimization are much higher than a count(*) optimization tho On Mon, 2004-01-05 at 12:26, Christopher Browne wrote: > Oops! siracusa@mindspring.com (John Siracusa) was seen spray-painting on a wall: > > Speaking of special cases (well, I was on the admin list) there are two > > kinds that would really benefit from some attention. > > > > 1. The query "select max(foo) from bar" where the column foo has an > > index. Aren't indexes ordered? If not, an "ordered index" would be > > useful in this situation so that this query, rather than doing a > > sequential scan of the whole table, would just "ask the index" for > > the max value and return nearly instantly. > > > > 2. The query "select count(*) from bar" Surely the total number of > > rows in a table is kept somewhere convenient. If not, it would be > > nice if it could be :) Again, rather than doing a sequential scan of > > the entire table, this type of query could return instantly. > > > > I believe MySQL does both of these optimizations (which are probably > > a lot easier in that product, given its data storage system). These > > were the first areas where I noticed a big performance difference > > between MySQL and Postgres. > > > > Especially with very large tables, hearing the disks grind as > > Postgres scans every single row in order to determine the number of > > rows in a table or the max value of a column (even a primary key > > created from a sequence) is pretty painful. If the implementation > > is not too horrendous, this is an area where an orders-of-magnitude > > performance increase can be had. > > These are both VERY frequently asked questions. > > In the case of question #1, the optimization you suggest could be > accomplished via some Small Matter Of Programming. None of the people > that have wanted the optimization have, however, offered to actually > DO the programming. > > In the case of #2, the answer is "surely NOT." In MVCC databases, > that information CANNOT be stored anywhere convenient because queries > requested by transactions started at different points in time must get > different answers. > > I think we need to add these questions and their answers to the FAQ so > that the answer can be "See FAQ Item #17" rather than people having to > gratuitously explain it over and over and over again.
Paul Tuckfield <paul@tuckfield.com> writes: > In the case of select count(*), one optimization is to do a scan of the > primary key, not the table itself, if the table has a primary key. In a > certain commercial, lesser database, this is called an "index fast full > scan". It would be important to scan the index in physical order > (sequential physical IO) and not in key order (random physical IO) That won't work because you still have to hit the actual tuple to determine visibility. -Doug
Martha Stewart called it a Good Thing when paul@tuckfield.com (Paul Tuckfield) wrote: > Not that I'm offering to do the porgramming mind you, :) but . . > > In the case of select count(*), one optimization is to do a scan of the > primary key, not the table itself, if the table has a primary key. In a > certain commercial, lesser database, this is called an "index fast full > scan". It would be important to scan the index in physical order > (sequential physical IO) and not in key order (random physical IO) The problem is that this "optimization" does not actually work. The index does not contain transaction visibility information, so you have to go to the pages of tuples in order to determine if any given tuple is visible. > I'm guessing the payoff as well as real-world-utility of a max(xxx) > optimization are much higher than a count(*) optimization tho That's probably so. In many cases, approximations, such as page counts, may be good enough, and pray consider, that ("an approximation") is probably all you were getting from the database systems that had an "optimization" to store the count in a counter. -- let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;; http://www3.sympatico.ca/cbbrowne/linuxxian.html "No, you misunderstand. Microsoft asked some hackers how they could make their system secure - the hackers replied "Turn it off.". So they did." -- Anthony Ord
pg@rbt.ca (Rod Taylor) wrote: >> Especially with very large tables, hearing the disks grind as Postgres scans >> every single row in order to determine the number of rows in a table or the >> max value of a column (even a primary key created from a sequence) is pretty >> painful. If the implementation is not too horrendous, this is an area where >> an orders-of-magnitude performance increase can be had. > > Actually, it's very painful. For MySQL, they've accepted the concurrancy > hit in order to accomplish it -- PostgreSQL would require a more subtle > approach. > > Anyway, with Rules you can force this: > > ON INSERT UPDATE counter SET tablecount = tablecount + 1; > > ON DELETE UPDATE counter SET tablecount = tablecount - 1; > > You need to create a table "counter" with a single row that will keep > track of the number of rows in the table. Just remember, you've now > serialized all writes to the table, but in your situation it may be > worth while. There's a still more subtle approach that relieves the serialization constraint, at some cost... - You add rules that _insert_ a row each time there is an insert/delete ON INSERT insert into counts(table, value) values ('our_table', 1); ON DELETE insert into counts(table, value) values ('our_table', -1); - The "select count(*) from our_table" is replaced by "select sum(value) from counts where table = 'our_table'" - Periodically, a "compression" process goes through and either: a) Deletes the rows for 'our_table' and replaces them with one row with a conventionally-scanned 'count(*)' value, or b) Computes "select table, sum(value) as value from counts group by table", deletes all the existing rows in counts, and replaces them by the preceding selection, or c) Perhaps does something incremental that's like b), but which only processes parts of the "count" table at once. Process 500 rows, then COMMIT, or something of the sort... Note that this "counts" table can potentially grow _extremely_ large. The "win" comes when it gets compressed, so that instead of scanning through 500K items, it index-scans through 27, the 1 that has the "497000" that was the state of the table at the last compression, and then 26 singletons. A win comes in if an INSERT that adds in 50 rows can lead to inserting ('our_table', 50) into COUNTS, or a delete that eliminates 5000 rows puts in ('our_table', -5000). It's vital to run the "compression" reasonably often (much like VACUUM :-)) in order that the COUNTS summary table stays relatively small. -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://www3.sympatico.ca/cbbrowne/wp.html Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
On Tuesday 06 January 2004 07:16, Christopher Browne wrote: > Martha Stewart called it a Good Thing when paul@tuckfield.com (Paul Tuckfield) wrote: > > Not that I'm offering to do the porgramming mind you, :) but . . > > > > In the case of select count(*), one optimization is to do a scan of the > > primary key, not the table itself, if the table has a primary key. In a > > certain commercial, lesser database, this is called an "index fast full > > scan". It would be important to scan the index in physical order > > (sequential physical IO) and not in key order (random physical IO) > > The problem is that this "optimization" does not actually work. The > index does not contain transaction visibility information, so you have > to go to the pages of tuples in order to determine if any given tuple > is visible. It was rejected as an idea to add transaction visibility information to indexes. The time I proposed, my idea was to vacuum tuples on page level while postgresql pushes buffers out of shared cache. If indexes had visibility information, they could be cleaned out of order than heap tuples. This wouldn't have eliminated vacuum entirely but at least frequently hit data would be clean. But it was rejected because of associated overhead. Just thought worh a mention.. Shridhar
On Tuesday 06 January 2004 01:22, Rod Taylor wrote: > Anyway, with Rules you can force this: > > ON INSERT UPDATE counter SET tablecount = tablecount + 1; > > ON DELETE UPDATE counter SET tablecount = tablecount - 1; That would generate lot of dead tuples in counter table. How about select relpages,reltuples from pg_class where relname=<tablename>; Assuming the stats are recent enough, it would be much faster and accurate.. Shridhar
On January 6, 2004 01:42 am, Shridhar Daithankar wrote: > On Tuesday 06 January 2004 01:22, Rod Taylor wrote: > > Anyway, with Rules you can force this: > > > > ON INSERT UPDATE counter SET tablecount = tablecount + 1; > > > > ON DELETE UPDATE counter SET tablecount = tablecount - 1; > > That would generate lot of dead tuples in counter table. How about > > select relpages,reltuples from pg_class where relname=<tablename>; > > Assuming the stats are recent enough, it would be much faster and > accurate.. Well, I did this: cert=# select relpages,reltuples from pg_class where relname= 'certificate'; relpages | reltuples ----------+------------- 399070 | 2.48587e+07 (1 row) Casting seemed to help: cert=# select relpages,reltuples::bigint from pg_class where relname= 'certificate'; relpages | reltuples ----------+----------- 399070 | 24858736 (1 row) But: cert=# select count(*) from certificate; [*Crunch* *Crunch* *Crunch*] count ---------- 19684668 (1 row) Am I missing something? Max certificate_id is 20569544 btw. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote: > On January 6, 2004 01:42 am, Shridhar Daithankar wrote: > cert=# select relpages,reltuples::bigint from pg_class where relname= > 'certificate'; > relpages | reltuples > ----------+----------- > 399070 | 24858736 > (1 row) > > But: > > cert=# select count(*) from certificate; > [*Crunch* *Crunch* *Crunch*] > count > ---------- > 19684668 > (1 row) > > Am I missing something? Max certificate_id is 20569544 btw. Do 'vacuum analyze certificate' and try..:-) The numbers from pg_class are estimates updated by vacuum /analyze. Of course you need to run vacuum frequent enough for that statistics to be updated all the time or run autovacuum daemon.. Ran into same problem on my machine till I remembered about vacuum..:-) Shridhar
On Tue, 2004-01-06 at 07:20, Shridhar Daithankar wrote: > On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote: > > On January 6, 2004 01:42 am, Shridhar Daithankar wrote: > > cert=# select relpages,reltuples::bigint from pg_class where relname= > > 'certificate'; > > relpages | reltuples > > ----------+----------- > > 399070 | 24858736 > > (1 row) > > > > But: > > > > cert=# select count(*) from certificate; > > [*Crunch* *Crunch* *Crunch*] > > count > > ---------- > > 19684668 > > (1 row) > > > > Am I missing something? Max certificate_id is 20569544 btw. > > Do 'vacuum analyze certificate' and try..:-) > > The numbers from pg_class are estimates updated by vacuum /analyze. Of course > you need to run vacuum frequent enough for that statistics to be updated all > the time or run autovacuum daemon.. > > Ran into same problem on my machine till I remembered about vacuum..:-) > Actually you only need to run analyze to update the statistics. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > On Tue, 2004-01-06 at 07:20, Shridhar Daithankar wrote: >>The numbers from pg_class are estimates updated by vacuum /analyze. Of course >>you need to run vacuum frequent enough for that statistics to be updated all >>the time or run autovacuum daemon.. >>Ran into same problem on my machine till I remembered about vacuum..:-) > Actually you only need to run analyze to update the statistics. Old habits die hard..:-) shridhar
if this situation persists after 'analyze certificate', then you need to: increase the statistics target 'alter table certificate alter column certificate_id set statistics 100' or 'vacuum full certificate' i.e : there are lots of (dead) updated or deleted tuples in the relation, distributed in such a way as to throw off analyze's estimate. regards Mark D'Arcy J.M. Cain wrote: > >Well, I did this: > >cert=# select relpages,reltuples from pg_class where relname= 'certificate'; > relpages | reltuples >----------+------------- > 399070 | 2.48587e+07 >(1 row) > >Casting seemed to help: > >cert=# select relpages,reltuples::bigint from pg_class where relname= >'certificate'; > relpages | reltuples >----------+----------- > 399070 | 24858736 >(1 row) > >But: > >cert=# select count(*) from certificate; >[*Crunch* *Crunch* *Crunch*] > count >---------- > 19684668 >(1 row) > >Am I missing something? Max certificate_id is 20569544 btw. > > >
On January 6, 2004 07:20 am, Shridhar Daithankar wrote: > On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote: > > On January 6, 2004 01:42 am, Shridhar Daithankar wrote: > > cert=# select relpages,reltuples::bigint from pg_class where relname= > > 'certificate'; > > relpages | reltuples > > ----------+----------- > > 399070 | 24858736 > > (1 row) > > > > But: > > > > cert=# select count(*) from certificate; > > [*Crunch* *Crunch* *Crunch*] > > count > > ---------- > > 19684668 > > (1 row) > > > > Am I missing something? Max certificate_id is 20569544 btw. > > Do 'vacuum analyze certificate' and try..:-) Kind of invalidates the part about being accurate then, don't it? Besides, I vacuum that table every day (*) and we have reorganized the schema so that we never update it except in exceptional cases. I would be less surprised if the result was less than the real count since we only insert into that table. In any case, if I have to vacuum a 20,000,000 row table to get an accurate count then I may as well run count(*) on it. (*): Actually I only analyze but I understand that that should be sufficient. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > In any case, if I have to vacuum a 20,000,000 row table to get an accurate > count then I may as well run count(*) on it. > (*): Actually I only analyze but I understand that that should be sufficient. ANALYZE without VACUUM will deliver a not-very-accurate estimate, since it only looks at a sample of the table's pages and doesn't grovel through every one. Any of the VACUUM variants, on the other hand, will set pg_class.reltuples reasonably accurately (as the number of rows actually seen and left undeleted by the VACUUM pass). There are pathological cases where ANALYZE's estimate of the overall row count can be horribly bad --- mainly, when the early pages of the table are empty or nearly so, but there are well-filled pages out near the end. I have a TODO item to try to make ANALYZE less prone to getting fooled that way... regards, tom lane
Hi, Shridhar Daithankar wrote: > > select relpages,reltuples from pg_class where relname=<tablename>; > > Assuming the stats are recent enough, it would be much faster and accurate.. this needs an analyze <tablename>; before select from pg_class, cause only after analyze will update pg the pg_class C.