Re: [SQL] Yet Another (Simple) Case of Index not used - Mailing list pgsql-performance

From Kevin Brown
Subject Re: [SQL] Yet Another (Simple) Case of Index not used
Date
Msg-id 20030421004630.GL1847@filer
Whole thread Raw
In response to Re: [SQL] Yet Another (Simple) Case of Index not used  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [SQL] Yet Another (Simple) Case of Index not used
Re: [SQL] Yet Another (Simple) Case of Index not used
List pgsql-performance
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > This is why I suspect the best way to manage this would be to manage
> > the counter itself using the MVCC mechanism (that is, you treat the
> > shared counter as a row in a table just like any other and, in fact,
> > it might be most beneficial for it to actually be exactly that), which
> > handles the visibility problem automatically.  But I don't know how
> > much contention there would be as a result.
>
> Hm.  Contention probably wouldn't be the killer, since if transactions
> don't try to update the count until they are about to commit, they won't
> be holding the row lock for long.  (You'd have to beware of deadlocks
> between transactions that need to update multiple counters, but that
> seems soluble.)  What *would* be a problem is that such counter tables
> would accumulate huge numbers of dead rows very quickly, making it
> inefficient to find the live row.

But that inefficiency is a problem for *all* oft-updated tables, is it
not?  I know that you'll end up with an additional n tuples per
transaction (where n is the average number of tables inserted into or
deleted from per transaction), so this isn't an insignificant problem,
but it's one faced by any application that often updates a small
table.

Causing a transaction which is already doing inserts/deletes to take
the hit of doing one additional update doesn't seem to me to be a
particularly large sacrifice, especially since the table it's updating
(the one that contains the counts) is likely to be cached in its
entirety.  The chances are reasonable that the other activity the
transaction is performing will dwarf the additional effort that
maintaining the count demands.

> Josh already mentioned this as a problem with user-trigger-based
> counting.

Right, but the trigger based mechanism probably magnifies the issue by
orders of magnitude, and thus can't necessarily be used as an argument
against an internally-implemented method.

> You could stanch the bleeding with sufficiently frequent vacuums,
> perhaps, but it just doesn't look very appealing.

I would say this is more a strong argument for automatic VACUUM
management than against count management, because what you say here is
true of any oft-updated, oft-referenced table.

> Ultimately what this comes down to is "how much overhead are we willing
> to load onto all other operations in order to make SELECT-COUNT(*)-with-
> no-WHERE-clause fast"?  Postgres has made a set of design choices that
> favor the other operations.  If you've designed an application that
> lives or dies by fast COUNT(*), perhaps you should choose another
> database.

Or perhaps a mechanism similar to the one being discussed should be
implemented and controlled with a GUC variable, so instead of forcing
someone to choose another database you force them to choose between
the performance tradeoffs involved.  We already give DBAs such choices
elsewhere, e.g. pg_stat_activity.

The real question in all this is whether or not fast COUNT(*)
operations are needed often enough to even justify implementing a
mechanism to make them possible in PG.  The question of getting fast
answers from COUNT(*) comes up often enough to be a FAQ, and that
suggests that there's enough demand for the feature that it may be
worth implementing just to shut those asking for it up.  :-)

Personally, I'd rather see such development effort go towards more
beneficial improvements, such as replication, 2PC, SQL/MED, etc. (or
even improving the efficiency of MVCC, since it was mentioned here as
a problem! :-).  I consider COUNT(*) without a WHERE clause to be a
corner case, despite the frequency of questions about it.  But I don't
think we should reject a patch to implement fast COUNT(*) just because
it represents a performance tradeoff, at least if it's GUC-controlled.


--
Kevin Brown                          kevin@sysexperts.com


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: problems
Next
From: Tom Lane
Date:
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used