"Jim C. Nasby" <jnasby@pervasive.com> writes:
> > The numbers I've been looking at lately say that heavy lock traffic is
> > expensive, particularly on SMP machines, even with zero contention.
> > Seems the cache coherency protocol costs a lot even when it's not doing
> > anything...
>
> Are there any ways we could avoid the locking?
Well if all we want to do is reproduce the current functionality of EXPLAIN
ANALYZE, all you need is a single sig_atomic_t int that you store the address
of the current node in. When you're done with the query you ask the central
statistics accumulator for how the counts of how many times it saw each node
address.
But that's sort of skipping some steps. The reason tas is slow is because it
needs to ensure cache coherency. Even just storing an int into shared memory
when its cache line is owned by the processor running the stats accumulator
would will have the same overhead.
That said I don't think it would actually be as much. TAS has to ensure no
other processor overwrites the data from this processor. Whereas merely
storing into a sig_atomic_t just has to ensure that you don't get a partially
written integer.
And it seems sort of silly to be doing this much work just to reproduce the
current functionality even if it's lower overhead. The big advantage of a
separate process would be that it could do much more. But that would require
pushing some representation of the plan out to the stats collector, storing
more information about the state than just what node you're at currently such
as whether you're in the midst of index lookups, sequential scans, operator
and function execution, etc.
--
greg