Jan Wieck <janwieck@yahoo.com> writes:
> And, before you discover this one: The backends send their
> statistic collection information via UDP packets. In the case
> of heavy database load, some of these packets can get lost so
> that the statistics will not be 100% accurate.
Recently the SourceForge DBAs got quite confused by this: under load,
the pg_stats_activity view would show query-in-progress entries for
backends that were not only not busy any more, but actually had
terminated long since. It took awhile to realize that this was pgstats
operating as designed and not a symptom of serious problems.
Although it's okay for pg_stats to lag the true state of affairs by
some amount of time, it's not good for a view that claims to be
current state to be wrong for indefinitely long periods. Would it
be possible to improve the reliability of transmission of backend-quit
messages somehow?
One idea that comes to mind is for pgstats to look through the shared
memory PROC list occasionally to see if its idea of active processes
still matches reality. Both idle and dead processes could be reliably
detected that way; also, we could detect busy (or at least
in-a-transaction) processes and change their viewable state to
"<unknown query>" if we hadn't gotten any query text from them.
Interestingly, this approach would allow a somewhat useful
pg_stats_activity view to be maintained even without *any* messages
transmitted by backends.
regards, tom lane