Greg Stark <stark@mit.edu> writes:
> But I'm not entirely convinced any of this is actually useful. Just
> becuase the transition table is large doesn't mean it's inefficient.
That's a fair point. However, I've often noticed base_yyparse() showing
up rather high on profiles --- higher than seemed plausible at the time,
given that its state-machine implementation is pretty tight. Now I'm
wondering whether that isn't coming from cache stalls from trying to
touch all the requisite parts of the transition table.
> valgrind comes with a tool called cachegrind which can emulate the
> cache algorithm on some variants of various cpus and produce reports.
> Can it be made to produce a report for a specific block of memory?
I believe that oprofile can be persuaded to produce statistics about
where in one's code are the most cache misses, not just the most
wall-clock ticks; which would shed a lot of light on this question.
However, my oprofile-fu doesn't quite extend to actually persuading it.
regards, tom lane