Thread: Re: [HACKERS] Much Ado About COUNT(*)
Merlin Moncure wrote: > 6. for large tables, you can get a pretty accurate count by doing: > select count(*) * 10 from t where random() > .9; > on my setup, this shaved about 15% off of the counting time...YMMV. That's an interesting idea, using sampling to get an estimate. Thanks for the tip. -- dave
David Garamond <lists@zara.6.isreserved.com> writes: > Merlin Moncure wrote: > > 6. for large tables, you can get a pretty accurate count by doing: > > select count(*) * 10 from t where random() > .9; > > on my setup, this shaved about 15% off of the counting time...YMMV. > > That's an interesting idea, using sampling to get an estimate. It's an interesting idea but this particular implementation isn't going to save any time. It still has to read every record only now it has to spend extra time doing a random() and the arithmetic. In order for sampling to speed things up you would have to use an index to actually reduce the number of records read. The database could be clever and implement the same kind of sampling vacuum does. That picks a random sampling of pages from the table without using an index. But there's no way to implement the same kind of behaviour from the user-visible features. -- greg
[snip] > The database could be clever and implement the same kind of sampling vacuum > does. That picks a random sampling of pages from the table without using an > index. But there's no way to implement the same kind of behaviour from the > user-visible features. ... meaning perhaps a new keyword accepted by SELECT, something like "SAMPLE 1000" ? Which would mean sample records in a 1:1000 ratio ? Would simplify (and probably speed up a lot) some estimating queries... Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > [snip] > > The database could be clever and implement the same kind of sampling vacuum > > does. That picks a random sampling of pages from the table without using an > > index. But there's no way to implement the same kind of behaviour from the > > user-visible features. > ... meaning perhaps a new keyword accepted by SELECT, something like > "SAMPLE 1000" ? Which would mean sample records in a 1:1000 ratio ? > Would simplify (and probably speed up a lot) some estimating queries... See: http://www.jlcomp.demon.co.uk/faq/random.html I think the Oracle syntax looks like SELECT * FROM foo SAMPLE (0.1) I don't think I would have picked this syntax but it seems like a better idea to copy the existing practice rather than invent a new one. There are some details, like what to do when there's a WHERE clause or joins. Oracle disallows joins entirely and I'm unclear what the best thing to do about where clauses would be. -- greg
[snip] > See: > > http://www.jlcomp.demon.co.uk/faq/random.html > > I think the Oracle syntax looks like > > SELECT * FROM foo SAMPLE (0.1) > > I don't think I would have picked this syntax but it seems like a better idea > to copy the existing practice rather than invent a new one. > > There are some details, like what to do when there's a WHERE clause or joins. > Oracle disallows joins entirely and I'm unclear what the best thing to do > about where clauses would be. The where clauses could be applied exactly as for a normal select, with the sampling being just a pre-filtering condition for what rows to consider. If there would be a way to specify the table on which to apply the sampling, then the whole construct could be replaced automatically by the inline view the oracle link recommends. I doubt there would be any benefit in sampling more than one table in a query, it should just work to sample the biggest table, and join the result with the others. Sampling is only useful for really big tables anyway. So the syntax above could be extended to: SELECT * FROM foo SAMPLE (foo, 0.1) and: SELECT foo.*, bar.* FROM foo, bar WHERE foo.key = bar.key SAMPLE (foo, 0.1) which means sample foo and join the result with bar. All this makes sense from a user point of view, I wonder how big a PITA is to implement it... Cheers, Csaba.
On 1/13/05 9:50 AM, "Greg Stark" <gsstark@mit.edu> wrote: > I think the Oracle syntax looks like > > SELECT * FROM foo SAMPLE (0.1) > > I don't think I would have picked this syntax but it seems like a better idea > to copy the existing practice rather than invent a new one. Of course, in Oracle 'count(*)' is instantaneous. It doesn't have to count the physical records one by one. Wes
Wes <wespvp@syntegra.com> writes: > On 1/13/05 9:50 AM, "Greg Stark" <gsstark@mit.edu> wrote: > > Of course, in Oracle 'count(*)' is instantaneous. It doesn't have to count > the physical records one by one. That's simply false. Oracle does indeed have to count the records one by one. It doesn't have to read and ignore the dead records since they're in a separate place (but on the other hand it sometimes have to go read that separate place when it sees records that were committed after your transaction). It can also do index-only scans, which often helps, but it's still not instantaneous. -- greg
On 1/13/05 6:44 PM, "Greg Stark" <gsstark@mit.edu> wrote: > That's simply false. Oracle does indeed have to count the records one by one. > > It doesn't have to read and ignore the dead records since they're in a > separate place (but on the other hand it sometimes have to go read that > separate place when it sees records that were committed after your > transaction). > > It can also do index-only scans, which often helps, but it's still not > instantaneous. Ok, I stand corrected - I was given some wrong information. However, my experience has been that count(*) on Oracle is a whole lot faster than PostgreSQL - what appeared instantaneous on Oracle took some time on PostgreSQL. That was one of the first things I noticed when moving a database application to PostgreSQL. I've since disposed of the Oracle database, so can't go back and retest. Wes
Wes <wespvp@syntegra.com> writes: > On 1/13/05 6:44 PM, "Greg Stark" <gsstark@mit.edu> wrote: > > > That's simply false. Oracle does indeed have to count the records one by one. > Ok, I stand corrected - I was given some wrong information. However, my > experience has been that count(*) on Oracle is a whole lot faster than > PostgreSQL - what appeared instantaneous on Oracle took some time on > PostgreSQL. That was one of the first things I noticed when moving a > database application to PostgreSQL. I've since disposed of the Oracle > database, so can't go back and retest. If it was instantaneous then the data must have all been in cache. A lot of Oracle kudos really come down to the fact that Oracle is often run on beefier machines than others. But if it was merely 2x as fast or so, more if the table was really wide, then it could have just been because of the fast index-only scan. If it was more than 2-4x as fast for a narrow table and you don't think the whole thing was in cache then I would start to wonder about whether your postgres table suffered from bloat from not having vacuum run frequently enough or having the fsm settings too low. -- greg
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is probably stupid for some reason, but why not use a 64-bit integer to track the number of records in the table? Increment when adding records, decrement when deleting them... then COUNT(*) could just return that in cases where a query is known to be looking at all of the records? On Jan 14, 2005, at 12:04 PM, Wes wrote: > On 1/13/05 6:44 PM, "Greg Stark" <gsstark@mit.edu> wrote: > >> That's simply false. Oracle does indeed have to count the records one >> by one. >> >> It doesn't have to read and ignore the dead records since they're in a >> separate place (but on the other hand it sometimes have to go read >> that >> separate place when it sees records that were committed after your >> transaction). >> >> It can also do index-only scans, which often helps, but it's still not >> instantaneous. > > Ok, I stand corrected - I was given some wrong information. However, > my > experience has been that count(*) on Oracle is a whole lot faster than > PostgreSQL - what appeared instantaneous on Oracle took some time on > PostgreSQL. That was one of the first things I noticed when moving a > database application to PostgreSQL. I've since disposed of the Oracle > database, so can't go back and retest. > > Wes > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > > - ----------------------------------------------------------- Frank D. Engel, Jr. <fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFB6AO47aqtWrR9cZoRAs7UAKCQL3VnSu3770AyFoKW/X7xR7REfQCeK1Ud M/sIP2jY+6LIfOcrJM5vvUQ= =Qlbi -----END PGP SIGNATURE----- ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
Frank D. Engel, Jr. wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > This is probably stupid for some reason, but why not use a 64-bit > integer to track the number of records in the table? Increment when > adding records, decrement when deleting them... then COUNT(*) could just > return that in cases where a query is known to be looking at all of the > records? Check the list archives for details, but you need to consider multiple backends inserting/deleting concurrently. What you need is a separate little table where you can log your transaction-id and number of rows added/removed then you can figure out how many rows there are from different viewpoints. -- Richard Huxton Archonet Ltd
On Fri, Jan 14, 2005 at 12:39:04PM -0500, Frank D. Engel, Jr. wrote: > This is probably stupid for some reason, but why not use a 64-bit > integer to track the number of records in the table? Increment when > adding records, decrement when deleting them... then COUNT(*) could > just return that in cases where a query is known to be looking at all > of the records? Because there is no single value for count(*), if you're in a transaction that has added records it will be bigger than in a transaction that hasn't. How does your integer deal with this? The usual solutions this involve locking, which is precisely what MVCC is designed to avoid. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yep, that could cause problems. Okay, now I'm joining the program. The only thing I can see that would fix this for the integer would be to keep track of the number of 'committed' records using the integer, then for each new transaction, make a copy of the integer in order to remember its "place", using an additional integer to track the offset (how many rows have been added or removed), so that it can be correctly applied at commit time. It's probably too messy to be worthwhile this way, though. More trouble than it would be worth. On Jan 14, 2005, at 1:38 PM, Martijn van Oosterhout wrote: > On Fri, Jan 14, 2005 at 12:39:04PM -0500, Frank D. Engel, Jr. wrote: >> This is probably stupid for some reason, but why not use a 64-bit >> integer to track the number of records in the table? Increment when >> adding records, decrement when deleting them... then COUNT(*) could >> just return that in cases where a query is known to be looking at all >> of the records? > > Because there is no single value for count(*), if you're in a > transaction that has added records it will be bigger than in a > transaction that hasn't. How does your integer deal with this? > > The usual solutions this involve locking, which is precisely what MVCC > is designed to avoid. > > Hope this helps, > -- > Martijn van Oosterhout <kleptog@svana.org> > http://svana.org/kleptog/ >> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is >> a >> tool for doing 5% of the work and then sitting around waiting for >> someone >> else to do the other 95% so you can sue them. >> - ----------------------------------------------------------- Frank D. Engel, Jr. <fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFB6BPb7aqtWrR9cZoRAjHHAJ9gOp6EOuZtvZATLX+3AUbvhQQmOwCdFF6J +6JlJKNjrTlYW/8kqu+Z9Xs= =OV2y -----END PGP SIGNATURE----- ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
Hello, Here I'm implementing a session management, which has a connections table partitioned between active and archived connections. A connection represents a connection between a user and a chatroom. I use partitioning for performance reasons. The active table contains all the data for the active session : user_id, chatroom_id, session start time, and other information. The archive table contains just the user_id, chatroom_id, session start and end time, for logging purposes, and for displaying on the site, which user was logged to which chatroom and from when to when. Thus, when a user disconnects from a chatroom, I must move one row from the active to the archive table. This poses no problem as there is a UNIQUE index (iser_id,chatroom_id) so I select the row FOR UPDATE, insert it in the archive table, then delete it. Now, when a user logs out from the site, or when his session is purged by the auto-expiration cron job, I must also expire ALL his open chatroom connections. INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...; DELETE FROM active WHERE user_id = ...; Now, if the user inserts a connection between the two queries above, the thing will fail (the connection will just be deleted). I know that there are many ways to do it right : - LOCK the table in exclusive mode - use an additional primary key on the active table which is not related to the user_id and the chatroom_id, select the id's of the sessions to expire in a temporary table, and use that - use an extra field in the table to mark that the rows are being processed - use transaction isolation level SERIALIZABLE However, all these methods somehow don't feel right, and as this is an often encountered problem, I'd really like to have a sql command, say MOVE, or SELECT AND DELETE, whatever, which acts like a SELECT, returning the rows, but deleting them as well. Then I'd just do INSERT INTO archive (...) SELECT ... AND DELETE FROM active WHERE user_id = ...; which would have the following advantages : - No worries about locks : - less chance of bugs - higher performance because locks have to be waited on, by definition - No need to do the request twice (so, it is twice as fast !) - Simplicity and elegance There would be an hidden bonus, that if you acquire locks, you better COMMIT the transaction as soon as possible to release them, whereas here, you can happily continue in the transaction. I think this command would make a nice cousin to the also very popular INSERT... OR UPDATE which tries to insert a row, and if it exists, UPDATES it instead of inserting it ! What do you think ?
On Fri, Jan 14, 2005 at 08:49:24PM +0100, PFC wrote: > the auto-expiration cron > job, I must also expire ALL his open chatroom connections. > INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...; > DELETE FROM active WHERE user_id = ...; > > Now, if the user inserts a connection between the two queries above, the > thing will fail (the > connection will just be deleted). I know that there are many ways to do it > right : Why not just do it in a single transaction? I don't think you need to use SERIALIZABLE at all, I think normal read-committed mode will do what you want, no? BEGIN; INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...; DELETE FROM active WHERE user_id = ...; COMMIT; The DELETE can only delete the rows returned by the select, that's the whole point of transactions... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > Why not just do it in a single transaction? I don't think you need to > use SERIALIZABLE at all, I think normal read-committed mode will do > what you want, no? > BEGIN; > INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...; > DELETE FROM active WHERE user_id = ...; > COMMIT; No, that's exactly wrong: in read-committed mode the DELETE could delete rows that were not seen by the SELECT. It would work in serializable mode though. regards, tom lane
On 1/14/05 12:47 PM, "Frank D. Engel, Jr." <fde101@fjrhome.net> wrote: > It's probably too messy to be worthwhile this > way, though. More trouble than it would be worth. It would be rather useful if there was a way to get a reasonably accurate count (better than analyze provides) in a very short period. When you've got a relatively wide table that has hundreds of millions to over a billion rows, and you need to report on how many rows in the table, that can take a long time. Wes
> BEGIN; > INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...; > DELETE FROM active WHERE user_id = ...; > COMMIT; > > The DELETE can only delete the rows returned by the select, that's the > whole point of transactions... Well, in the case of having a unique index on user_id, and if no-one updates the row between the insert and the delete, it will work ;) And if someone updates it in between, well, the update is not archived, so you have to LOCK your row FOR UPDATE. And if this procedure is called twice at the same time, the rows will be historized twice... etc... Which is precisely why I don't like this approach ! As a side note, I've installed 8.0 rc, and wow. The slow queries feel a lot faster on the command prompt, my small queries have became faster too... very good work ! In the end, I've implemented it with an AFTER DELETE trigger on the 'live' table, which, after the row has been deleted, inserts it in the history table, using the magic variable OLD. This will work because the row is already deleted, thus can't be concurrently updated by another transaction (because a transaction trying to update a row will wait on the lock acquired by the DELETE, and vice versa). So, for ONE row at a time, in a trigger, it works beautifully, thank you postgres ! I find the solution very elegant : when a session expires, it is deleted from the live session table, then the trigger catches it just in time to shove it in the sessions history table, then some other tables like user-to-chatroom connections, which happen to have a user_id referencing into the live sessions table, get the ON DELETE CASCADE and are also purged and historized automatically. I am very happy with this solution BUT it's done one-row-at-a-time, so it's slower than I'd like ! The key is to insert the row deleted from the live table into the history table AFTER it has been deleted, to avoid all funky locks problems. Now consider the following : you must DELETE several items at the same time and historize them. If you INSERT then DELETE: - records can be inserted, updated or deleted between the two. The inserted ones will be historized but not deleted (duplicates !), the deleted ones will be lost forever, unhistorized, the updated ones won't have their updates historized. Not very well for concurrecy ! You can LOCK FOR UPDATE before, this solves the UPDATE and DELETE problem, but not the INSERT problem. You can, of course, lock the entire table, but well, it reminds me too much of the MySQL way. You can also use SERIALIZABLE mode which solves all the problems, but if something goes wrong, everything fails and you have to retry the whole trasaction, whereas a proper lock would be waited on... If there is a primary key in the 'live' table you can SELECT FOR UPDATE into a tamporary table, then delete using the pkeys in the temp table, then insert from the temp table... ugly ! That's why I bother you to have the possibility of DELETE returning the DELETE'd rows ;) It's not very useful if you process one row, but when you process several at a time, it would be really great, because instead of 2*N queries (DELETE+INSERT hidden in a trigger) you'd just do one (INSERT ... DELETE AND SELECT ... FROM ...). Today, if you don't want to do it in a trigger, you have to have a unique index, SELECT FOR UPDATE, INSERT, DELETE, that's three queries per row. In a perfect world, this would be then used in an ON DELETE RULE which would replace the DELETES by deletes inserting the rows into the history table Also I've thought about some other interesting applications, if DELETE returns rows, why not UPDATE or even INSERT ? Many applications use INSERT... then SELECT currval(sequence). I also like to set defaults in the database, like for instance some rows which have timestamp fields defaulting to now() or things like that. I have a tree table with a ltree field which is generated by a trigger from the parent's path and the current row's id. Some other fields are also inherited from the parent. Why not do INSERT INTO ... AND SELECT ... which would return the sequence field, and any other fields which have been initialized by ON INSERT triggers... this would be neat... instead of INSERT, SELECT currval, SELECT .. FROM table WHERE id=... Same thing for on update triggers. You could replace some plpgsql procedures with one query, and what's more important, not worry about locking headaches. Anyway, my problem is solved now with triggers, but I like the idea very much (and Oracle has it) (and Tom once said a DELETE was just more or less like a SELECT)... so ... Regards
"Frank D. Engel, Jr." <fde101@fjrhome.net> writes: > Yep, that could cause problems. Okay, now I'm joining the program. > > The only thing I can see that would fix this > ... There are well understood mechanisms to fix this. It's a "SMOP" or "simple matter of programming". What you would do is insert into a summary table a record that indicates how many records you've inserted into the master table. Periodically you have some daemon collect up those records and replace them with a single record. But this can be done already by hand and it's not clear having the database do it automatically is necessarily a good idea. It would impose a cost on every insert when most of the time it wouldn't be useful. Moreover this is just a special case of a general problem called "materialized views". If it were added to the database it would probably be more worthwhile implementing a more general feature that could handle other aggregate functions besides count(*) as well as other types of queries besides simple unqualified aggregates. -- greg