Re: RFC: replace pg_stat_activity.waiting with something more descriptive - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |
Date | |
Msg-id | CAHGQGwFjQ3pmv8Yeknxtz4G=ntZRqP6NHwrqkcSGpRQufaboJA@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: replace pg_stat_activity.waiting with something more descriptive (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: RFC: replace pg_stat_activity.waiting with something
more descriptive
|
List | pgsql-hackers |
On Fri, Jun 26, 2015 at 12:39 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 25, 2015 at 9:23 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On 6/22/15 1:37 PM, Robert Haas wrote: >>> Currently, the only time we report a process as waiting is when it is >>> waiting for a heavyweight lock. I'd like to make that somewhat more >>> fine-grained, by reporting the type of heavyweight lock it's awaiting >>> (relation, relation extension, transaction, etc.). Also, I'd like to >>> report when we're waiting for a lwlock, and report either the specific >>> fixed lwlock for which we are waiting, or else the type of lock (lock >>> manager lock, buffer content lock, etc.) for locks of which there is >>> more than one. I'm less sure about this next part, but I think we >>> might also want to report ourselves as waiting when we are doing an OS >>> read or an OS write, because it's pretty common for people to think >>> that a PostgreSQL bug is to blame when in fact it's the operating >>> system that isn't servicing our I/O requests very quickly. >> >> Could that also cover waiting on network? > > Possibly. My approach requires that the number of wait states be kept > relatively small, ideally fitting in a single byte. And it also > requires that we insert pgstat_report_waiting() calls around the thing > that is notionally blocking. So, if there are a small number of > places in the code where we do network I/O, we could stick those calls > around those places, and this would work just fine. But if a foreign > data wrapper, or any other piece of code, does network I/O - or any > other blocking operation - without calling pgstat_report_waiting(), we > just won't know about it. Probably Itagaki-san's very similar proposal and patch would be useful to consider what wait events to track. http://www.postgresql.org/message-id/20090309125146.913C.52131E4D@oss.ntt.co.jp According to his patch, the wait events that he was thinking to add were: + typedef enum PgCondition + { + PGCOND_UNUSED = 0, /* unused */ + + /* 10000 - CPU */ + PGCOND_CPU = 10000, /* generic cpu operations */ + /* 11000 - CPU:PARSE */ + PGCOND_CPU_PARSE = 11000, /* pg_parse_query */ + PGCOND_CPU_PARSE_ANALYZE = 11100, /* parse_analyze */ + /* 12000 - CPU:REWRITE */ + PGCOND_CPU_REWRITE = 12000, /* pg_rewrite_query */ + /* 13000 - CPU:PLAN */ + PGCOND_CPU_PLAN = 13000, /* pg_plan_query */ + /* 14000 - CPU:EXECUTE */ + PGCOND_CPU_EXECUTE = 14000, /* PortalRun or PortalRunMulti */ + PGCOND_CPU_TRIGGER = 14100, /* ExecCallTriggerFunc */ + PGCOND_CPU_SORT = 14200, /* (generic sort operation) */ + PGCOND_CPU_SORT_HEAP = 14210, /* tuplesort_begin_heap */ + PGCOND_CPU_SORT_INDEX = 14220, /* tuplesort_begin_index_btree */ + PGCOND_CPU_SORT_DATUM = 14230, /* tuplesort_begin_datum */ + /* 15000 - CPU:UTILITY */ + PGCOND_CPU_UTILITY = 15000, /* ProcessUtility */ + PGCOND_CPU_COMMIT = 15100, /* CommitTransaction */ + PGCOND_CPU_ROLLBACK = 15200, /* AbortTransaction */ + /* 16000 - CPU:TEXT */ + PGCOND_CPU_TEXT = 16000, /* (generic text operation) */ + PGCOND_CPU_DECODE = 16100, /* pg_client_to_server */ + PGCOND_CPU_ENCODE = 16200, /* pg_server_to_client */ + PGCOND_CPU_LIKE = 16310, /* GenericMatchText */ + PGCOND_CPU_ILIKE = 16320, /* Generic_Text_IC_like */ + PGCOND_CPU_RE = 16400, /* (generic regexp operation) */ + PGCOND_CPU_RE_COMPILE = 16410, /* RE_compile_and_cache */ + PGCOND_CPU_RE_EXECUTE = 16420, /* RE_execute */ + + /* 20000 - NETWORK */ + PGCOND_NETWORK = 20000, /* (generic network operation) */ + PGCOND_NETWORK_RECV = 21000, /* secure_read */ + PGCOND_NETWORK_SEND = 22000, /* secure_write */ + + /* 30000 - IDLE (should be larger than network to distinguish idle or recv) */ + PGCOND_IDLE = 30000, /* <IDLE> */ + PGCOND_IDLE_IN_TRANSACTION = 31000, /* <IDLE> in transaction */ + PGCOND_IDLE_SLEEP = 32000, /* pg_usleep */ + + /* 40000 - XLOG */ + PGCOND_XLOG = 40000, /* (generic xlog operation) */ + PGCOND_XLOG_CRC = 41000, /* crc calculation in XLogInsert */ + PGCOND_XLOG_INSERT = 42000, /* insert in XLogInsert */ + PGCOND_XLOG_OPEN = 43000, /* XLogFileOpen */ + PGCOND_XLOG_CLOSE = 44000, /* XLogFileClose */ + PGCOND_XLOG_WRITE = 45000, /* write in XLogWrite */ + PGCOND_XLOG_FLUSH = 46000, /* issue_xlog_fsync */ + + /* 50000 - DATA */ + PGCOND_DATA = 50000, /* (generic data operation) */ + PGCOND_DATA_CREATE = 51000, /* smgrcreate */ + PGCOND_DATA_OPEN = 52000, /* smgropen */ + PGCOND_DATA_CLOSE = 53000, /* smgrclose */ + PGCOND_DATA_STAT = 54000, /* smgrnblocks */ + PGCOND_DATA_READ = 55000, /* smgrread */ + PGCOND_DATA_PREFETCH = 56000, /* smgrprefetch */ + PGCOND_DATA_WRITE = 57000, /* smgrwrite */ + PGCOND_DATA_EXTEND = 58000, /* smgrextend */ + + /* 60000 - TEMP */ + PGCOND_TEMP = 60000, /* (generic temp file operation) */ + PGCOND_TEMP_READ = 61000, /* BufFileRead */ + PGCOND_TEMP_WRITE = 62000, /* BufFileWrite */ + + /* 70000 - LOCK */ + PGCOND_LOCK = 70000, /* waiting on a lmgr lock */ + /* 70001-70999 is reserved for lmgr locks */ + + /* 80000 - LWLOCK */ + PGCOND_LWLOCK = 80000, /* waiting on a generic lwlock */ + /* 80001-80999 is reserved for named lwlocks */ + PGCOND_LWLOCK_BUFMAPPING = 81000, /* BufMappingLock(s) */ + PGCOND_LWLOCK_LOCKMGR = 82000, /* LockMgrLock(s) */ + PGCOND_LWLOCK_PAGE = 83000, /* BufferDesc.content_lock */ + PGCOND_LWLOCK_IO = 84000, /* BufferDesc.io_in_progress_lock */ + + /* 90000 - SPINLOCK */ + PGCOND_SPINLOCK = 90000 /* timeout in s_lock */ + } PgCondition; Regards, -- Fujii Masao
pgsql-hackers by date: