Re: proposal for 9.5: monitoring lock time for slow queries - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal for 9.5: monitoring lock time for slow queries
Date
Msg-id CAFj8pRBfKAiL0inY0ZvFCpsRFzQg81vtB3OF2bQN-JPBqCeWeA@mail.gmail.com
Whole thread Raw
In response to Re: proposal for 9.5: monitoring lock time for slow queries  ("MauMau" <maumau307@gmail.com>)
Responses Re: proposal for 9.5: monitoring lock time for slow queries
List pgsql-hackers



2014-08-13 11:14 GMT+02:00 MauMau <maumau307@gmail.com>:
From: "Pavel Stehule" <pavel.stehule@gmail.com>

There are two relative independent tasks

a) monitor and show total lock time of living queries

b) monitor and log total lock time of executed queries.

I am interested by @b now. When we work with slow query log, then we would
to identify reason for long duration. Locks are important source of these
queries on some systems.

I'm interested in b, too.  I was thinking of proposing a performance diagnostics feature like Oracle's wait events (V$SYSTEM_EVENT and V$SESSION_EVENT).  So, if you do this, I'd like to contribute to the functional design, code and doc review, and testing.

isn't it too heavy?

I have just terrible negative experience with Vertica, where this design is used - almost all information about queries are available, but any query to related tables are terrible slow, so I am inclined to more simple design oriented to log based solution. Table based solutions is not practical when you exec billions queries per day. I understand to motivation, but I afraid so it can be very expensive and slow on highly load servers.
 

The point is to collect as much information about bottlenecks as possible, including lock waits.  The rough sketch is:

What info to collect:
* heavyweight lock waits shown by pg_locks
* lightweight lock waits
* latch waits
* socket waits (mainly for client input)


 

How the info is delivered:
* pg_stat_system_events shows the accumulated total accross the server instance
* pg_stat_session_events shows the accumulated total for each session
* EXPLAIN ANALYZE and auto_explain shows the accumulated total for each query

We need to describe in the manual how to diagnose and tne the system with these event info.

Regards
MauMau


pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: proposal for 9.5: monitoring lock time for slow queries
Next
From: David Rowley
Date:
Subject: Re: strncpy is not a safe version of strcpy