Thread: select count(*) on large tables
Hello, I've followed the last discussion about the particular case of "select count(*)"s on large tables being somewhat slow. I've seen also this issue already on the todo list, so I know it is not a simple question. This problem arises for me on very large tables, which I mean starting from 1 million rows and above. The alternative solution I tried, that has an optimal speed up, unfortunately is not a way out, and it is based on "EXPLAIN SELECT count(*)" output parsing, which is obviously *not* reliable. The times always get better doing a vacuum (and eventually reindex) of the table, and they slowly lower again. Is there an estimate time for this issue to be resolved? Can I help in some way (code, test cases, ...)? -- Cosimo
On Thu, 8 Apr 2004, Cosimo Streppone wrote: > The alternative solution I tried, that has an optimal > speed up, unfortunately is not a way out, and it is based > on "EXPLAIN SELECT count(*)" output parsing, which > is obviously *not* reliable. Try this to get the estimate: SELECT relname, reltuples from pg_class order by relname; > The times always get better doing a vacuum (and eventually > reindex) of the table, and they slowly lower again. Yes, the estimate is updated by the analyze. > Is there an estimate time for this issue to be resolved? It's not so easy to "fix". The naive fixes makes other operations slower, most notably makes things less concurrent which is bad since it wont scale as good for many users then. You can always keep the count yourself and have some triggers that update the count on each insert and delete on the table. It will of course make all inserts and deletes slower, but if you count all rows often maybe it's worth it. Most people do not need to count all rows in a table anyway. You usually count all rows such as this and that (some condition). -- /Dennis Björklund
Cosimo Streppone <cosimo@streppone.it> writes: > Is there an estimate time for this issue to be resolved? Approximately never. It's a fundamental feature of Postgres' design. As noted by Dennis, you can look at the pg_class statistics if a recent estimate is good enough, or you can build user-level tracking tools if you'd rather have fast count(*) than concurrent update capability. But don't sit around waiting for the developers to "fix this bug", because it isn't a bug and it isn't going to be fixed. regards, tom lane