Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date
Msg-id 95c26651303d0e5227cb763061cf15dc@oss.nttdata.com
Whole thread Raw
In response to Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers
On 2021-01-25 10:36, Masahiko Sawada wrote:
> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote:
>> 
>> On 2021-01-22 14:50, Masahiko Sawada wrote:
>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
>> > <ikedamsh@oss.nttdata.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I rebased the patch to the master branch.
>> >
>> > Thank you for working on this. I've read the latest patch. Here are
>> > comments:
>> >
>> > ---
>> > +               if (track_wal_io_timing)
>> > +               {
>> > +                   INSTR_TIME_SET_CURRENT(duration);
>> > +                   INSTR_TIME_SUBTRACT(duration, start);
>> > +                   WalStats.m_wal_write_time +=
>> > INSTR_TIME_GET_MILLISEC(duration);
>> > +               }
>> >
>> > * I think it should add the time in micro sec.
>> > After running pgbench with track_wal_io_timing = on for 30 sec,
>> > pg_stat_wal showed the following on my environment:
>> >
>> > postgres(1:61569)=# select * from pg_stat_wal;
>> > -[ RECORD 1 ]----+-----------------------------
>> > wal_records      | 285947
>> > wal_fpi          | 53285
>> > wal_bytes        | 442008213
>> > wal_buffers_full | 0
>> > wal_write        | 25516
>> > wal_write_time   | 0
>> > wal_sync         | 25437
>> > wal_sync_time    | 14490
>> > stats_reset      | 2021-01-22 10:56:13.29464+09
>> >
>> > Since writes can complete less than a millisecond, wal_write_time
>> > didn't increase. I think sync_time could also have the same problem.
>> 
>> Thanks for your comments. I didn't notice that.
>> I changed the unit from milliseconds to microseconds.
>> 
>> > ---
>> > +   /*
>> > +    * Measure i/o timing to fsync WAL data.
>> > +    *
>> > +    * The wal receiver skip to collect it to avoid performance
>> > degradation of standy servers.
>> > +    * If sync_method doesn't have its fsync method, to skip too.
>> > +    */
>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
>> > fsyncMethodCalled())
>> > +       INSTR_TIME_SET_CURRENT(start);
>> >
>> > * Why does only the wal receiver skip it even if track_wal_io_timinig
>> > is true? I think the performance degradation is also true for backend
>> > processes. If there is another reason for that, I think it's better to
>> > mention in both the doc and comment.
>> > * How about checking track_wal_io_timing first?
>> > * s/standy/standby/
>> 
>> I fixed it.
>> As kuroda-san mentioned too, the skip is no need to be considered.
> 
> I think you also removed the code to have the wal receiver report the
> stats. So with the latest patch, the wal receiver tracks those
> statistics but doesn't report.
> And maybe XLogWalRcvWrite() also needs to track I/O?

Thanks, I forgot to add them.
I'll fix it.


>> 
>> > ---
>> > +   /* increment the i/o timing and the number of times to fsync WAL
>> > data */
>> > +   if (fsyncMethodCalled())
>> > +   {
>> > +       if (!AmWalReceiverProcess() && track_wal_io_timing)
>> > +       {
>> > +           INSTR_TIME_SET_CURRENT(duration);
>> > +           INSTR_TIME_SUBTRACT(duration, start);
>> > +           WalStats.m_wal_sync_time +=
>> > INSTR_TIME_GET_MILLISEC(duration);
>> > +       }
>> > +
>> > +       WalStats.m_wal_sync++;
>> > +   }
>> >
>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
>> > incrementing m_wal_sync after each sync operation?
>> 
>> I think if syncing the disk does not occur, m_wal_sync should not be
>> incremented.
>> It depends enableFsync and sync_method.
>> 
>> enableFsync is checked in each fsync method like
>> pg_fsync_no_writethrough(),
>> so if incrementing m_wal_sync after each sync operation, it should be
>> implemented
>> in each fsync method. It leads to many duplicated codes.
> 
> Right. I missed that each fsync function checks enableFsync.
> 
>> So, why don't you change the function to a flag whether to
>> sync data to the disk will be occurred or not in issue_xlog_fsync()?
> 
> Looks better. Since we don't necessarily need to increment m_wal_sync
> after doing fsync we can write the code without an additional variable
> as follows:
> 
>     if (enableFsync)
>     {
>         switch (sync_method)
>         {
>             case SYNC_METHOD_FSYNC:
> #ifdef HAVE_FSYNC_WRITETHROUGH
>             case SYNC_METHOD_FSYNC_WRITETHROUGH:
> #endif
> #ifdef HAVE_FDATASYNC
>             case SYNC_METHOD_FDATASYNC:
> #endif
>                 WalStats.m_wal_sync++;
>                 if (track_wal_io_timing)
>                     INSTR_TIME_SET_CURRENT(start);
>                 break;
>             default:
>                 break;
>         }
>     }
> 
>   (do fsync and error handling here)
> 
>    /* increment the i/o timing and the number of times to fsync WAL 
> data */
>    if (track_wal_io_timing)
>    {
>        INSTR_TIME_SET_CURRENT(duration);
>        INSTR_TIME_SUBTRACT(duration, start);
>        WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
>    }

IIUC, I think we can't handle the following case.

When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and
"track_wal_io_timing" is enabled, "start" doesn't be initialized.

My understanding is something wrong, isn't it?


> I think we can change the first switch-case to an if statement.

Yes, I'll change it.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: About to add WAL write/fsync statistics to pg_stat_wal view
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching