a) monitor and show total lock time of living queries
b) monitor and log total lock time of executed queries.
I am interested by @b now. When we work with slow query log, then we would to identify reason for long duration. Locks are important source of these queries on some systems.
I'm interested in b, too. I was thinking of proposing a performance diagnostics feature like Oracle's wait events (V$SYSTEM_EVENT and V$SESSION_EVENT). So, if you do this, I'd like to contribute to the functional design, code and doc review, and testing.
isn't it too heavy?
I have just terrible negative experience with Vertica, where this design is used - almost all information about queries are available, but any query to related tables are terrible slow, so I am inclined to more simple design oriented to log based solution. Table based solutions is not practical when you exec billions queries per day. I understand to motivation, but I afraid so it can be very expensive and slow on highly load servers.
The point is to collect as much information about bottlenecks as possible, including lock waits. The rough sketch is:
What info to collect: * heavyweight lock waits shown by pg_locks * lightweight lock waits * latch waits * socket waits (mainly for client input)
How the info is delivered: * pg_stat_system_events shows the accumulated total accross the server instance * pg_stat_session_events shows the accumulated total for each session * EXPLAIN ANALYZE and auto_explain shows the accumulated total for each query
We need to describe in the manual how to diagnose and tne the system with these event info.