Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers

From japin
Subject Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date
Msg-id MEYP282MB1669B25B065F504ACB6C06E0B6A00@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers
Hi, Masahiro

Thanks for you update the v4 patch.  Here are some comments:

(1)
+       char            *msg = NULL;
+       bool            sync_called;            /* whether to sync data to the disk. */
+       instr_time      start;
+       instr_time      duration;
+
+       /* check whether to sync data to the disk is really occurred. */
+       sync_called = false;

Maybe we can initialize the "sync_called" variable when declare it.

(2)
+       if (sync_called)
+       {
+               /* increment the i/o timing and the number of times to fsync WAL data */
+               if (track_wal_io_timing)
+               {
+                       INSTR_TIME_SET_CURRENT(duration);
+                       INSTR_TIME_SUBTRACT(duration, start);
+                       WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
+               }
+
+               WalStats.m_wal_sync++;
+       }

There is an extra space before INSTR_TIME_GET_MICROSEC(duration).

In the issue_xlog_fsync(), the comment says that if sync_method is
SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC, it already write synced.
Does that mean it synced when write the WAL data?  And for those cases, we
cannot get accurate write/sync timing and number of write/sync times, right?

        case SYNC_METHOD_OPEN:
        case SYNC_METHOD_OPEN_DSYNC:
            /* write synced it already */
            break;

On Fri, 22 Jan 2021 at 21:05, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
> On 2021-01-22 14:50, Masahiko Sawada wrote:
>> On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda 
>> <ikedamsh@oss.nttdata.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I rebased the patch to the master branch.
>> 
>> Thank you for working on this. I've read the latest patch. Here are 
>> comments:
>> 
>> ---
>> +               if (track_wal_io_timing)
>> +               {
>> +                   INSTR_TIME_SET_CURRENT(duration);
>> +                   INSTR_TIME_SUBTRACT(duration, start);
>> +                   WalStats.m_wal_write_time +=
>> INSTR_TIME_GET_MILLISEC(duration);
>> +               }
>> 
>> * I think it should add the time in micro sec.
>> After running pgbench with track_wal_io_timing = on for 30 sec,
>> pg_stat_wal showed the following on my environment:
>> 
>> postgres(1:61569)=# select * from pg_stat_wal;
>> -[ RECORD 1 ]----+-----------------------------
>> wal_records      | 285947
>> wal_fpi          | 53285
>> wal_bytes        | 442008213
>> wal_buffers_full | 0
>> wal_write        | 25516
>> wal_write_time   | 0
>> wal_sync         | 25437
>> wal_sync_time    | 14490
>> stats_reset      | 2021-01-22 10:56:13.29464+09
>> 
>> Since writes can complete less than a millisecond, wal_write_time
>> didn't increase. I think sync_time could also have the same problem.
>
> Thanks for your comments. I didn't notice that.
> I changed the unit from milliseconds to microseconds.
>
>> ---
>> +   /*
>> +    * Measure i/o timing to fsync WAL data.
>> +    *
>> +    * The wal receiver skip to collect it to avoid performance
>> degradation of standy servers.
>> +    * If sync_method doesn't have its fsync method, to skip too.
>> +    */
>> +   if (!AmWalReceiverProcess() && track_wal_io_timing && 
>> fsyncMethodCalled())
>> +       INSTR_TIME_SET_CURRENT(start);
>> 
>> * Why does only the wal receiver skip it even if track_wal_io_timinig
>> is true? I think the performance degradation is also true for backend
>> processes. If there is another reason for that, I think it's better to
>> mention in both the doc and comment.
>> * How about checking track_wal_io_timing first?
>> * s/standy/standby/
>
> I fixed it.
> As kuroda-san mentioned too, the skip is no need to be considered.
>
>> ---
>> +   /* increment the i/o timing and the number of times to fsync WAL 
>> data */
>> +   if (fsyncMethodCalled())
>> +   {
>> +       if (!AmWalReceiverProcess() && track_wal_io_timing)
>> +       {
>> +           INSTR_TIME_SET_CURRENT(duration);
>> +           INSTR_TIME_SUBTRACT(duration, start);
>> +           WalStats.m_wal_sync_time += 
>> INSTR_TIME_GET_MILLISEC(duration);
>> +       }
>> +
>> +       WalStats.m_wal_sync++;
>> +   }
>> 
>> * I'd avoid always calling fsyncMethodCalled() in this path. How about
>> incrementing m_wal_sync after each sync operation?
>
> I think if syncing the disk does not occur, m_wal_sync should not be 
> incremented.
> It depends enableFsync and sync_method.
>
> enableFsync is checked in each fsync method like 
> pg_fsync_no_writethrough(),
> so if incrementing m_wal_sync after each sync operation, it should be 
> implemented
> in each fsync method. It leads to many duplicated codes.
>
> So, why don't you change the function to a flag whether to
> sync data to the disk will be occurred or not in issue_xlog_fsync()?
>
>
>> ---
>> +/*
>> + * Check if fsync mothod is called.
>> + */
>> +static bool
>> +fsyncMethodCalled()
>> +{
>> +   if (!enableFsync)
>> +       return false;
>> +
>> +   switch (sync_method)
>> +   {
>> +       case SYNC_METHOD_FSYNC:
>> +       case SYNC_METHOD_FSYNC_WRITETHROUGH:
>> +       case SYNC_METHOD_FDATASYNC:
>> +           return true;
>> +       default:
>> +           /* others don't have a specific fsync method */
>> +           return false;
>> +   }
>> +}
>> 
>> * I'm concerned that the function name could confuse the reader
>> because it's called even before the fsync method is called. As I
>> commented above, calling to fsyncMethodCalled() can be eliminated.
>> That way, this function is called at only once. So do we really need
>> this function?
>
> Thanks to your comments, I removed them.
>
>
>> * As far as I read the code, issue_xlog_fsync() seems to do fsync even
>> if enableFsync is false. Why does the function return false in that
>> case? I might be missing something.
>
> IIUC, the reason is that I thought that each fsync functions like 
> pg_fsync_no_writethrough() check enableFsync.
>
> If this code doesn't check, m_wal_sync_time may be incremented
> even though some sync methods like SYNC_METHOD_OPEN don't call to sync 
> some data to the disk at the time.
>
>> * void is missing as argument?
>> 
>> * s/mothod/method/
>
> I removed them.
>
>
> Regards,


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [PoC] Non-volatile WAL buffer
Next
From: Tom Lane
Date:
Subject: Re: [PATCH 1/1] Fix detection of pwritev support for OSX.