Re: Yet Another COUNT(*)...WHERE...question - Mailing list pgsql-general
| From | Bill Moran | 
|---|---|
| Subject | Re: Yet Another COUNT(*)...WHERE...question | 
| Date | |
| Msg-id | 20070815115738.352012cb.wmoran@potentialtech.com Whole thread Raw | 
| In response to | Re: Yet Another COUNT(*)...WHERE...question ("Phoenix Kiula" <phoenix.kiula@gmail.com>) | 
| List | pgsql-general | 
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>: > > Yes, optimization. :) You don't need an exact count to tell someone > > that there's more data and they can go to it. > > > In general, I agree. But my example of Amazon was only to illustrate > the point about two queries and why they may be needed. I seem to see > many more pages than you do, but in any case, Google and Amazon can > afford to be less precise. > > Thanks for the suggestion of using EXPLAIN and parsing an > approximation, but when you need to show a trader how many trades he > has made, for instance, then approximation is not a possibility at > all. Especially not if the numbers sway so wildly -- > > > FIRSTDB=# explain select * from trades where t_id = 'kXjha'; > QUERY PLAN > ----------------------------------------------------------------------------------- > Bitmap Heap Scan on trades (cost=15.77..1447.12 rows=374 width=224) > Recheck Cond: ((t_id)::text = 'kXjha'::text) > -> Bitmap Index Scan on trades_tid_date (cost=0.00..15.67 rows=374 width=0) > Index Cond: ((t_id)::text = 'kXjha'::text) > (4 rows) > > FIRSTDB=# select count(*) from trades where t_id = 'kXjha'; > count > ------- > 3891 > (1 row) > > > > Could I do something so that the EXPLAIN showed up with slightly more > close-to-accurate stats? The above query is just after a "vacuum > analyze"! In the above case, you could probably materialize the data with a trigger that updates a counter in a separate table every time a new trade is added. This will give you 100% accurate results with _very_ fast response time. Part of the problem is that there's no one answer to your question, there are multiple approaches to solving it, depending on the details of the problem and the acceptable time/accuracy of the answers. Some basic approaches: 1) Materialize the data. MySQL actually does this automatically for you with MyISAM tables, which is why count(*) is so fast. But if you absolutely need fast, accurate counts, you can build your own triggers in PG. This is unlikely to be practical with all queries. 2) Estimate. The accuracy of estimates can vary wildly by query and how often the database is analyzed, etc. For something like, "show results 1 - 10 of about 50,000", estimates are great and fast, but for other cases, not acceptable. The good news is you can get a fast estimate from any query with no up-front work. 3) Accept that sometimes to get accurate answers it's going to take time. Around here, we call it the "Orbitz" technique, because when we discuss it, everyone thinks of the "please wait while I process your query" page you get from orbitz.com. You'd be surprised how willing your users are to wait, as long as they know they have to wait. 4) Throw more hardware at it. If you absolutely _must_have_ super- accurate results faster, then you may need to buy more RAM, faster disks and faster CPUs to accomplish it. 5) Come up with something revolutionary that nobody's every thought of before. Good luck with this one. Of course, all of these ideas are only practical if you've already ensured that your system is properly tuned. Crappy values for shared_buffers and other tuning will lead you to waste time trying to redesign something that should work just fine, so verify all your configuration first. You may be able to get more acceptable estimates by increasing your statistics targets, for example. -- Bill Moran http://www.potentialtech.com
pgsql-general by date: