Thread: Monitoring time of fsyncing WALs

Monitoring time of fsyncing WALs

From

Konstantin Knizhnik

Date:

27 June 2018, 22:32:18

I wonder why we are monitoring time of writing to WAL, but not time of 
fsyncing WAL segments?
Is there are principle reason for it or just because nobody added it yet?
If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC 
event type.

Our engineers in PgPro complain me that there is no information about 
time spent in syncing WALs...
Unfortunately Postgres still is not able to aggregate this statistic. 
But at least we have pg_wait_sampling extension for it:
https://github.com/postgrespro/pg_wait_sampling

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

wait_event_wal_fsync.patch

Re: Monitoring time of fsyncing WALs

From

David Steele

Date:

28 June 2018, 03:29:43

On 6/27/18 12:32 PM, Konstantin Knizhnik wrote:
> I wonder why we are monitoring time of writing to WAL, but not time of 
> fsyncing WAL segments?
> Is there are principle reason for it or just because nobody added it yet?
> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC 
> event type.

+1.

-- 
-David
david@pgmasters.net

Re: Monitoring time of fsyncing WALs

From

Michael Paquier

Date:

28 June 2018, 08:27:21

On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote:
> I wonder why we are monitoring time of writing to WAL, but not time of
> fsyncing WAL segments?
> Is there are principle reason for it or just because nobody added it yet?
> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event
> type.

Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the
other wait events of the same type, and also list the wait event
alphabetically everywhere this is added.  I have also reworded the
documentation to be more consistent.

> Our engineers in PgPro complain me that there is no information about time
> spent in syncing WALs...
> Unfortunately Postgres still is not able to aggregate this statistic. But at
> least we have pg_wait_sampling extension for it:
> https://github.com/postgrespro/pg_wait_sampling

Complain justified.  It is a bit too late for v11 I think though, so
let's wait for v12 to open for business, and then I'll apply the patch
at if there are no objections until then.

Attached is an updated patch.
--
Michael

Attachment

Re: Monitoring time of fsyncing WALs

From

Robert Haas

Date:

29 June 2018, 18:48:33

On Wed, Jun 27, 2018 at 10:27 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote:
>> I wonder why we are monitoring time of writing to WAL, but not time of
>> fsyncing WAL segments?
>> Is there are principle reason for it or just because nobody added it yet?
>> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event
>> type.
>
> Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the
> other wait events of the same type, and also list the wait event
> alphabetically everywhere this is added.  I have also reworded the
> documentation to be more consistent.
>
>> Our engineers in PgPro complain me that there is no information about time
>> spent in syncing WALs...
>> Unfortunately Postgres still is not able to aggregate this statistic. But at
>> least we have pg_wait_sampling extension for it:
>> https://github.com/postgrespro/pg_wait_sampling
>
> Complain justified.  It is a bit too late for v11 I think though, so
> let's wait for v12 to open for business, and then I'll apply the patch
> at if there are no objections until then.

Are there other instances of fsync() that also need to be covered?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Monitoring time of fsyncing WALs

From

Konstantin Knizhnik

Date:

29 June 2018, 19:26:34


On 29.06.2018 15:48, Robert Haas wrote:
> On Wed, Jun 27, 2018 at 10:27 PM, Michael Paquier <michael@paquier.xyz> wrote:
>> On Wed, Jun 27, 2018 at 07:32:18PM +0300, Konstantin Knizhnik wrote:
>>> I wonder why we are monitoring time of writing to WAL, but not time of
>>> fsyncing WAL segments?
>>> Is there are principle reason for it or just because nobody added it yet?
>>> If so, please find very small patch which adding WAIT_EVENT_WAL_FSYNC event
>>> type.
>> Let's name it WAIT_EVENT_WAL_SYNC as it is more consistent with the
>> other wait events of the same type, and also list the wait event
>> alphabetically everywhere this is added.  I have also reworded the
>> documentation to be more consistent.
>>
>>> Our engineers in PgPro complain me that there is no information about time
>>> spent in syncing WALs...
>>> Unfortunately Postgres still is not able to aggregate this statistic. But at
>>> least we have pg_wait_sampling extension for it:
>>> https://github.com/postgrespro/pg_wait_sampling
>> Complain justified.  It is a bit too late for v11 I think though, so
>> let's wait for v12 to open for business, and then I'll apply the patch
>> at if there are no objections until then.
> Are there other instances of fsync() that also need to be covered?
>
Syncing database files is already monitored (in mdsync and FileFlush).
If we grep for pg_fsync occurrences in Postgres code, then the only 
place where pgstat_report_wait_start is not called before is 
issue_xlog_fsync in xlog.c

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Monitoring time of fsyncing WALs

From

Michael Paquier

Date:

29 June 2018, 19:39:00

On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote:
> Are there other instances of fsync() that also need to be covered?

Yeah, I double-checked the calls to pg_fsync back when I looked at this
patch, but now that I look again I see at least two more of them:
- fsync_fname_ext.
- write_auto_conf_file, where a wait event for a write() is missing as
well.
Hm.  I am wondering if it would not be worth extending pg_fsync() with
an argument for a wait event and introduce a sort of pg_write() which
wraps write() with an extra wait event argument, and similarly for
read() (warning, conflict with Windows ahead!).  That's somehow related
to the feeling I had when working with transient file's writes and reads
a couple of days back.  Those are most likely going to be forgotten
again and again in the future.  In both cases the caller would still be
responsible for looking at errno and decide the error handling, but I
got no feedback about the idea on transient files.

There are also a couple of places where the wait events are longer than
they should.  For example in writeTimeLineHistory, there is a wait event
for write() which is still switched on even within an error code path.
And on top of it I think that a call to pgstat_report_wait_end() is
missing in the error code path as on ERROR the session still exists.
That's a bug.  Those need an extra lookup and visibly some cleanup back
to v10.
--
Michael

Attachment

signature.asc

Re: Monitoring time of fsyncing WALs

From

Michael Paquier

Date:

01 July 2018, 09:29:27

On Fri, Jun 29, 2018 at 10:39:00PM +0900, Michael Paquier wrote:
> On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote:
>> Are there other instances of fsync() that also need to be covered?
>
> Yeah, I double-checked the calls to pg_fsync back when I looked at this
> patch, but now that I look again I see at least two more of them:
> - fsync_fname_ext.

This one is used internally by things like durable_rename or such, on
which I am not sure that we actually can pass down correctly a wait
event as multiple syncs may happen within each call, so that would
require a weird API layer.

> - write_auto_conf_file, where a wait event for a write() is missing as
> well.

I have been looking at the archives and this has been left out on
purpose:
https://postgr.es/m/CAGPqQf0bzVfTTZdxcr4qHb3WDbn=+iH6sWchbN4HGjhwtcbPYQ@mail.gmail.com

> Hm.  I am wondering if it would not be worth extending pg_fsync() with
> an argument for a wait event and introduce a sort of pg_write() which
> wraps write() with an extra wait event argument, and similarly for
> read() (warning, conflict with Windows ahead!).  That's somehow related
> to the feeling I had when working with transient file's writes and reads
> a couple of days back.  Those are most likely going to be forgotten
> again and again in the future.  In both cases the caller would still be
> responsible for looking at errno and decide the error handling, but I
> got no feedback about the idea on transient files.

That would finish by being sort of ugly as durable_rename or such would
also need some treatment.  That would be quite invasive.

> There are also a couple of places where the wait events are longer than
> they should.  For example in writeTimeLineHistory, there is a wait event
> for write() which is still switched on even within an error code path.
> And on top of it I think that a call to pgstat_report_wait_end() is
> missing in the error code path as on ERROR the session still exists.
> That's a bug.  Those need an extra lookup and visibly some cleanup back
> to v10.

On ERROR a backend would switch back quickly on ClientRead.  Perhaps
we'd want to call pgstat_report_wait_end() when an error or higher is
thrown?  That's a different discussion.

So at the end, I would like to use the proposed patch and call it a
day.  Thoughts?
--
Michael

Attachment

signature.asc

Re: Monitoring time of fsyncing WALs

From

Craig Ringer

Date:

02 July 2018, 09:36:06

On 1 July 2018 at 11:29, Michael Paquier <michael@paquier.xyz> wrote:

So at the end, I would like to use the proposed patch and call it a
day. Thoughts?

Patch looks good.

I'll hijack the thread to add a few more perf/dtrace tracepoints in the WAL code, as they were also missing. Proposed rider patch attached.

I've updated https://wiki.postgresql.org/wiki/Profiling_with_perf#PostgreSQL_pre-defined_tracepoint_events to document how to add tracepoints.

It's not that hard to trace duration of a given function call with dtrace without such annotations, but they make it much easier to discover where in a large codebase to look, providing a form of documentation. With perf they make life much easier. I should add some more to make it easier to analyse relation extension contention on indexes and the heap, instrument btree index ops like page splits, instrument heavyweight locking (beyond LOCK_WAIT_START), etc. They're also handy for gdb - https://sourceware.org/gdb/onlinedocs/gdb/Static-Probe-Points.html - also handy for gdb's "info probes".

BTW, we might want to instrument the pgstat_ counter calls and pgstat_report_wait_start / pgstat_report_wait_end, but it's easy enough to use dynamic tracepoints for those so I haven't yet. Maybe even just document them as points of interest.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v1-0001-Add-more-statically-defined-tracepoints-around-xl.patch

Re: Monitoring time of fsyncing WALs

From

Michael Paquier

Date:

02 July 2018, 19:25:16

On Mon, Jul 02, 2018 at 11:36:06AM +0800, Craig Ringer wrote:
> On 1 July 2018 at 11:29, Michael Paquier <michael@paquier.xyz> wrote:
>> So at the end, I would like to use the proposed patch and call it a
>> day.  Thoughts?
>>
> Patch looks good.

Thanks Craig for the review!  I have just pushed the previous patch with
the adjustments mentioned.  Regarding enforcing pg_fsync with a wait
event, I think I'll step back on this one.  There are also arguments for
allowing code paths to not have wait events as that's quite low-level,
and for extensions the choice is limited as long as there is no way to
register custom names, if that ever happens, which I am not sure of.

> I'll hijack the thread to add a few more perf/dtrace tracepoints in the WAL
> code, as they were also missing. Proposed rider patch attached.

:)

As that is a separate discussion and as the commit fest has already
begun with many items in the queue, could you begin a new thread and add
this stuff within a new CF entry?  I would say that having more trace
points in this area could be helpful, but let's discuss that
appropriately so as your proposal catches proper attention.

> BTW, we might want to instrument the pgstat_ counter calls
> and pgstat_report_wait_start / pgstat_report_wait_end, but it's easy enough
> to use dynamic tracepoints for those so I haven't yet. Maybe even just
> document them as points of interest.

Perhaps.  Feel free to propose anything you have in mind of course.
--
Michael

Attachment

signature.asc