Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > Josh Berkus wrote:
> >> They did incorporate a lot of MVCC logic into InnoDB tables, yes.
> >> Which means that if SELECT count(*) on an InnoDB table is just as
> >> fast as a MyISAM table, then it is not accurate.
>
> > This is not necessarily true. The trigger-based approach to tracking
> > the current number of rows in a table might well be implemented
> > internally, and that may actually be much faster than doing it using
> > triggers
>
> You missed the point of Josh's comment: in an MVCC system, the correct
> COUNT() varies depending on which transaction is asking. Therefore it
> is not possible for a centrally maintained row counter to give accurate
> results to everybody, no matter how cheap it is to maintain.
Hmm...true...but only if you really implement it as a faithful copy of
the trigger-based method. Implementing it on the backend brings some
advantages to the table, to wit:
* The individual transactions don't need to update the
externally-visible count on every insert or delete, they only need
to update it at commit time.
* The transaction can keep a count of the number of inserted and
deleted tuples it generates (on a per-table basis) during the life
of the transaction. The count value it returns to a client is the
count value it reads from the table that stores the count value plus
any differences that have been applied during the transaction. This
is fast, because the backend handling the transaction can keep this
difference value in its own private memory.
* When a transaction commits, it only needs to apply the "diff value"
it stores internally to the external count value.
Contention on the count value is only an issue if the external count
value is currently being written to by a transaction in the commit
phase. But the only time a transaction will be interested in reading
that value is when it's performing a count(*) operation or when it's
committing inserts/deletes that happened on the table in question (and
then only if the number of tuples inserted differs from the number
deleted). So the total amount of contention should be relatively low.
> (The cheapness can be disputed as well, since it creates a single point
> of contention for all inserts and deletes on the table. But that's a
> different topic.)
That's true, but the single point of contention is only an issue at
transaction commit time (unless you're implementing READ UNCOMMITTED),
at least if you do something like what I described above.
--
Kevin Brown kevin@sysexperts.com