Re: [Proposal] Add accumulated statistics for wait event - Mailing list pgsql-hackers
| From | Jehan-Guillaume de Rorthais | 
|---|---|
| Subject | Re: [Proposal] Add accumulated statistics for wait event | 
| Date | |
| Msg-id | 20210614161032.60172b5b@firost Whole thread Raw | 
| In response to | Re: [Proposal] Add accumulated statistics for wait event (Andres Freund <andres@anarazel.de>) | 
| Responses | Re: [Proposal] Add accumulated statistics for wait event | 
| List | pgsql-hackers | 
Hi Andres, Hi all, First, thank you for your feedback! Please find in attachment a patch implementing accumulated wait event stats only from the backend point of view. As I wrote when I reviewed and rebased the existing patch, I was uncomfortable with the global approach. I still volunteer to work/review the original approach is required. See bellow for comments and some more explanations about what I think might be improvements over the previous patch. On Fri, 11 Jun 2021 12:18:07 -0700 Andres Freund <andres@anarazel.de> wrote: > On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote: > > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001 > > From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> > > Date: Tue, 1 Jun 2021 13:25:57 +0200 > > Subject: [PATCH 1/2] Add pg_stat_waitaccum view. > > > > pg_stat_waitaccum shows counts and duration of each wait events. > > Each backend/backgrounds counts and measures the time of wait event > > in every pgstat_report_wait_start and pgstat_report_wait_end. They > > store those info into their local variables and send to Statistics > > Collector. We can get those info via Statistics Collector. > > > > For reducing overhead, I implemented statistic hash instead of > > dynamic hash. I also implemented track_wait_timing which > > determines wait event duration is collected or not. > > I object to adding this overhead. The whole selling point for wait > events was that they are low overhead. I since spent time reducing the > overhead further, because even just the branches for checking if > track_activity is enabled are measurable (225a22b19ed). Agree. The previous patch I rebased was to review it and reopen this discussion, I even added a small FIXME in pgstat_report_wait_end and pgstat_report_wait_start about your work: //FIXME: recent patch to speed up this call. In the patch in attachment, I tried to fix this by using kind of an internal hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to "instrument" wait events only when required, on the fly, dynamically. Moreover, I removed the hash structure for a simple static array for faster access. > > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001 > > From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> > > Date: Fri, 4 Jun 2021 18:14:51 +0200 > > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from > > INSTR_TIME to rdtsc. > > > > This patch changes measuring method of wait event time from INSTR_TIME > > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the > > overhead of measuring overhead. > > > > Any supports like changing clock cycle to actual time or error handling are > > not currently implemented. > > rdtsc is a serializing (*) instruction - that's the expensive part. On linux > clock_gettime() doesn't actually need a syscall. While the vdso call > implies a bit of overhead over a raw rdtsc, it's a relatively small part > of the overhead. See > https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de I choose to remove all this rdtsc part from my patch as this wasn't clear how much faster it was compare to simpler vdso functions and how to accurately extract a human time. About my take on $subject, for the sake of simplicity of this PoC, I added instrumentation to log_statement_stats. Despite the query context of the reported log, they are really accumulated stats. The patch updated pg_stat_get_waitaccum() as well to be able to report the accumulated wait events from your interactive or batch session. So using my previous fe-time demo client, you can test it using: PGOPTIONS="--log_statement_stats=on" ./fe-time 100 From logs, I now have (notice the last line): LOG: duration: 3837.194 ms statement: SELECT * FROM pgbench_accounts LOG: QUERY STATISTICS DETAIL: ! system usage stats: ! 0.087444 s user, 0.002106 s system, 3.837202 s elapsed ! [0.087444 s user, 0.003974 s system total] ! 25860 kB max resident size ! 0/0 [0/0] filesystem blocks in/out ! 0/303 [0/697] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 4/18 [5/18] voluntary/involuntary context switches ! Client/ClientWrite 4 calls, 3747102 us elapsed Using pgbench scale factor 10, the copy query for pgbench_accounts looks like: LOG: duration: 2388.081 ms statement: copy pgbench_accounts from stdin LOG: QUERY STATISTICS DETAIL: ! system usage stats: ! 1.373756 s user, 0.252860 s system, 2.388100 s elapsed ! [1.397015 s user, 0.264951 s system total] ! 37788 kB max resident size ! 0/641584 [0/642056] filesystem blocks in/out ! 194/4147 [195/4728] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 3263/92 [3390/102] voluntary/involuntary context switches ! LWLock/WALBufMapping 3 calls, 654 us elapsed ! LWLock/WALWrite 3 calls, 60680 us elapsed ! LWLock/CheckpointerComm 3 calls, 318 us elapsed ! Client/ClientRead 1 calls, 151 us elapsed ! IO/DataFileExtend 16397 calls, 94000 us elapsed ! IO/DataFileWrite 14346 calls, 45911 us elapsed ! IO/WALInitSync 6 calls, 334840 us elapsed ! IO/WALInitWrite 6 calls, 48040 us elapsed ! IO/WALSync 17 calls, 353334 us elapsed ! IO/WALWrite 8362 calls, 38401 us elapsed Now, I'm on a fence about the user facing interaction. I'll keep thinking about this and report this week. In the meantime, any feedback about the current implementation backbone is welcome. Thanks! Regards,
Attachment
pgsql-hackers by date: