Re: [SQL] Yet Another (Simple) Case of Index not used - Mailing list pgsql-performance
From | Kevin Brown |
---|---|
Subject | Re: [SQL] Yet Another (Simple) Case of Index not used |
Date | |
Msg-id | 20030421004630.GL1847@filer Whole thread Raw |
In response to | Re: [SQL] Yet Another (Simple) Case of Index not used (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [SQL] Yet Another (Simple) Case of Index not used
Re: [SQL] Yet Another (Simple) Case of Index not used |
List | pgsql-performance |
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > This is why I suspect the best way to manage this would be to manage > > the counter itself using the MVCC mechanism (that is, you treat the > > shared counter as a row in a table just like any other and, in fact, > > it might be most beneficial for it to actually be exactly that), which > > handles the visibility problem automatically. But I don't know how > > much contention there would be as a result. > > Hm. Contention probably wouldn't be the killer, since if transactions > don't try to update the count until they are about to commit, they won't > be holding the row lock for long. (You'd have to beware of deadlocks > between transactions that need to update multiple counters, but that > seems soluble.) What *would* be a problem is that such counter tables > would accumulate huge numbers of dead rows very quickly, making it > inefficient to find the live row. But that inefficiency is a problem for *all* oft-updated tables, is it not? I know that you'll end up with an additional n tuples per transaction (where n is the average number of tables inserted into or deleted from per transaction), so this isn't an insignificant problem, but it's one faced by any application that often updates a small table. Causing a transaction which is already doing inserts/deletes to take the hit of doing one additional update doesn't seem to me to be a particularly large sacrifice, especially since the table it's updating (the one that contains the counts) is likely to be cached in its entirety. The chances are reasonable that the other activity the transaction is performing will dwarf the additional effort that maintaining the count demands. > Josh already mentioned this as a problem with user-trigger-based > counting. Right, but the trigger based mechanism probably magnifies the issue by orders of magnitude, and thus can't necessarily be used as an argument against an internally-implemented method. > You could stanch the bleeding with sufficiently frequent vacuums, > perhaps, but it just doesn't look very appealing. I would say this is more a strong argument for automatic VACUUM management than against count management, because what you say here is true of any oft-updated, oft-referenced table. > Ultimately what this comes down to is "how much overhead are we willing > to load onto all other operations in order to make SELECT-COUNT(*)-with- > no-WHERE-clause fast"? Postgres has made a set of design choices that > favor the other operations. If you've designed an application that > lives or dies by fast COUNT(*), perhaps you should choose another > database. Or perhaps a mechanism similar to the one being discussed should be implemented and controlled with a GUC variable, so instead of forcing someone to choose another database you force them to choose between the performance tradeoffs involved. We already give DBAs such choices elsewhere, e.g. pg_stat_activity. The real question in all this is whether or not fast COUNT(*) operations are needed often enough to even justify implementing a mechanism to make them possible in PG. The question of getting fast answers from COUNT(*) comes up often enough to be a FAQ, and that suggests that there's enough demand for the feature that it may be worth implementing just to shut those asking for it up. :-) Personally, I'd rather see such development effort go towards more beneficial improvements, such as replication, 2PC, SQL/MED, etc. (or even improving the efficiency of MVCC, since it was mentioned here as a problem! :-). I consider COUNT(*) without a WHERE clause to be a corner case, despite the frequency of questions about it. But I don't think we should reject a patch to implement fast COUNT(*) just because it represents a performance tradeoff, at least if it's GUC-controlled. -- Kevin Brown kevin@sysexperts.com
pgsql-performance by date: