Thread: Monitoring time of fsyncing WALs
I wonder why we are monitoring time of writing to WAL, but not time of fsyncing WAL segments? Is there are principle reason for it or just because nobody added it yet? If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event type. Our engineers in PgPro complain me that there is no information about time spent in syncing WALs... Unfortunately Postgres still is not able to aggregate this statistic. But at least we have pg_wait_sampling extension for it: https://github.com/postgrespro/pg_wait_sampling -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 6/27/18 12:32 PM, Konstantin Knizhnik wrote: > I wonder why we are monitoring time of writing to WAL, but not time of > fsyncing WAL segments? > Is there are principle reason for it or just because nobody added it yet? > If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC > event type. +1. -- -David david@pgmasters.net
On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote: > I wonder why we are monitoring time of writing to WAL, but not time of > fsyncing WAL segments? > Is there are principle reason for it or just because nobody added it yet? > If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event > type. Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the other wait events of the same type, and also list the wait event alphabetically everywhere this is added. I have also reworded the documentation to be more consistent. > Our engineers in PgPro complain me that there is no information about time > spent in syncing WALs... > Unfortunately Postgres still is not able to aggregate this statistic. But at > least we have pg_wait_sampling extension for it: > https://github.com/postgrespro/pg_wait_sampling Complain justified. It is a bit too late for v11 I think though, so let's wait for v12 to open for business, and then I'll apply the patch at if there are no objections until then. Attached is an updated patch. -- Michael
Attachment
On Wed, Jun 27, 2018 at 10:27 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote: >> I wonder why we are monitoring time of writing to WAL, but not time of >> fsyncing WAL segments? >> Is there are principle reason for it or just because nobody added it yet? >> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event >> type. > > Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the > other wait events of the same type, and also list the wait event > alphabetically everywhere this is added. I have also reworded the > documentation to be more consistent. > >> Our engineers in PgPro complain me that there is no information about time >> spent in syncing WALs... >> Unfortunately Postgres still is not able to aggregate this statistic. But at >> least we have pg_wait_sampling extension for it: >> https://github.com/postgrespro/pg_wait_sampling > > Complain justified. It is a bit too late for v11 I think though, so > let's wait for v12 to open for business, and then I'll apply the patch > at if there are no objections until then. Are there other instances of fsync() that also need to be covered? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29.06.2018 15:48, Robert Haas wrote: > On Wed, Jun 27, 2018 at 10:27 PM, Michael Paquier <michael@paquier.xyz> wrote: >> On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote: >>> I wonder why we are monitoring time of writing to WAL, but not time of >>> fsyncing WAL segments? >>> Is there are principle reason for it or just because nobody added it yet? >>> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event >>> type. >> Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the >> other wait events of the same type, and also list the wait event >> alphabetically everywhere this is added. I have also reworded the >> documentation to be more consistent. >> >>> Our engineers in PgPro complain me that there is no information about time >>> spent in syncing WALs... >>> Unfortunately Postgres still is not able to aggregate this statistic. But at >>> least we have pg_wait_sampling extension for it: >>> https://github.com/postgrespro/pg_wait_sampling >> Complain justified. It is a bit too late for v11 I think though, so >> let's wait for v12 to open for business, and then I'll apply the patch >> at if there are no objections until then. > Are there other instances of fsync() that also need to be covered? > Syncing database files is already monitored (in mdsync and FileFlush). If we grep for pg_fsync occurrences in Postgres code, then the only place where pgstat_report_wait_start is not called before is issue_xlog_fsync in xlog.c -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote: > Are there other instances of fsync() that also need to be covered? Yeah, I double-checked the calls to pg_fsync back when I looked at this patch, but now that I look again I see at least two more of them: - fsync_fname_ext. - write_auto_conf_file, where a wait event for a write() is missing as well. Hm. I am wondering if it would not be worth extending pg_fsync() with an argument for a wait event and introduce a sort of pg_write() which wraps write() with an extra wait event argument, and similarly for read() (warning, conflict with Windows ahead!). That's somehow related to the feeling I had when working with transient file's writes and reads a couple of days back. Those are most likely going to be forgotten again and again in the future. In both cases the caller would still be responsible for looking at errno and decide the error handling, but I got no feedback about the idea on transient files. There are also a couple of places where the wait events are longer than they should. For example in writeTimeLineHistory, there is a wait event for write() which is still switched on even within an error code path. And on top of it I think that a call to pgstat_report_wait_end() is missing in the error code path as on ERROR the session still exists. That's a bug. Those need an extra lookup and visibly some cleanup back to v10. -- Michael
Attachment
On Fri, Jun 29, 2018 at 10:39:00PM +0900, Michael Paquier wrote: > On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote: >> Are there other instances of fsync() that also need to be covered? > > Yeah, I double-checked the calls to pg_fsync back when I looked at this > patch, but now that I look again I see at least two more of them: > - fsync_fname_ext. This one is used internally by things like durable_rename or such, on which I am not sure that we actually can pass down correctly a wait event as multiple syncs may happen within each call, so that would require a weird API layer. > - write_auto_conf_file, where a wait event for a write() is missing as > well. I have been looking at the archives and this has been left out on purpose: https://postgr.es/m/CAGPqQf0bzVfTTZdxcr4qHb3WDbn=+iH6sWchbN4HGjhwtcbPYQ@mail.gmail.com > Hm. I am wondering if it would not be worth extending pg_fsync() with > an argument for a wait event and introduce a sort of pg_write() which > wraps write() with an extra wait event argument, and similarly for > read() (warning, conflict with Windows ahead!). That's somehow related > to the feeling I had when working with transient file's writes and reads > a couple of days back. Those are most likely going to be forgotten > again and again in the future. In both cases the caller would still be > responsible for looking at errno and decide the error handling, but I > got no feedback about the idea on transient files. That would finish by being sort of ugly as durable_rename or such would also need some treatment. That would be quite invasive. > There are also a couple of places where the wait events are longer than > they should. For example in writeTimeLineHistory, there is a wait event > for write() which is still switched on even within an error code path. > And on top of it I think that a call to pgstat_report_wait_end() is > missing in the error code path as on ERROR the session still exists. > That's a bug. Those need an extra lookup and visibly some cleanup back > to v10. On ERROR a backend would switch back quickly on ClientRead. Perhaps we'd want to call pgstat_report_wait_end() when an error or higher is thrown? That's a different discussion. So at the end, I would like to use the proposed patch and call it a day. Thoughts? -- Michael
Attachment
On 1 July 2018 at 11:29, Michael Paquier <michael@paquier.xyz> wrote:
--
So at the end, I would like to use the proposed patch and call it a
day. Thoughts?
Patch looks good.
I'll hijack the thread to add a few more perf/dtrace tracepoints in the WAL code, as they were also missing. Proposed rider patch attached.
I've updated https://wiki.postgresql.org/wiki/Profiling_with_perf#PostgreSQL_pre-defined_tracepoint_events to document how to add tracepoints.
It's not that hard to trace duration of a given function call with dtrace without such annotations, but they make it much easier to discover where in a large codebase to look, providing a form of documentation. With perf they make life much easier. I should add some more to make it easier to analyse relation extension contention on indexes and the heap, instrument btree index ops like page splits, instrument heavyweight locking (beyond LOCK_WAIT_START), etc. They're also handy for gdb - https://sourceware.org/gdb/onlinedocs/gdb/Static-Probe-Points.html - also handy for gdb's "info probes".
BTW, we might want to instrument the pgstat_ counter calls and pgstat_report_wait_start / pgstat_report_wait_end, but it's easy enough to use dynamic tracepoints for those so I haven't yet. Maybe even just document them as points of interest.
Attachment
On Mon, Jul 02, 2018 at 11:36:06AM +0800, Craig Ringer wrote: > On 1 July 2018 at 11:29, Michael Paquier <michael@paquier.xyz> wrote: >> So at the end, I would like to use the proposed patch and call it a >> day. Thoughts? >> > Patch looks good. Thanks Craig for the review! I have just pushed the previous patch with the adjustments mentioned. Regarding enforcing pg_fsync with a wait event, I think I'll step back on this one. There are also arguments for allowing code paths to not have wait events as that's quite low-level, and for extensions the choice is limited as long as there is no way to register custom names, if that ever happens, which I am not sure of. > I'll hijack the thread to add a few more perf/dtrace tracepoints in the WAL > code, as they were also missing. Proposed rider patch attached. :) As that is a separate discussion and as the commit fest has already begun with many items in the queue, could you begin a new thread and add this stuff within a new CF entry? I would say that having more trace points in this area could be helpful, but let's discuss that appropriately so as your proposal catches proper attention. > BTW, we might want to instrument the pgstat_ counter calls > and pgstat_report_wait_start / pgstat_report_wait_end, but it's easy enough > to use dynamic tracepoints for those so I haven't yet. Maybe even just > document them as points of interest. Perhaps. Feel free to propose anything you have in mind of course. -- Michael