Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date
Msg-id CAD21AoAMxDwey2v5tVF-3Xhu2VNUXA-9CUuZ1ZmKS3__nceJTw@mail.gmail.com
Whole thread Raw
In response to Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers
On Mon, Jan 25, 2021 at 1:28 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> On 2021-01-25 13:15, Masahiro Ikeda wrote:
> > On 2021-01-25 10:36, Masahiko Sawada wrote:
> >> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda
> >> <ikedamsh@oss.nttdata.com> wrote:
> >>>
> >>> On 2021-01-22 14:50, Masahiko Sawada wrote:
> >>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda
> >>> > <ikedamsh@oss.nttdata.com> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I rebased the patch to the master branch.
> >>> >
> >>> > Thank you for working on this. I've read the latest patch. Here are
> >>> > comments:
> >>> >
> >>> > ---
> >>> > +               if (track_wal_io_timing)
> >>> > +               {
> >>> > +                   INSTR_TIME_SET_CURRENT(duration);
> >>> > +                   INSTR_TIME_SUBTRACT(duration, start);
> >>> > +                   WalStats.m_wal_write_time +=
> >>> > INSTR_TIME_GET_MILLISEC(duration);
> >>> > +               }
> >>> >
> >>> > * I think it should add the time in micro sec.
> >>> > After running pgbench with track_wal_io_timing = on for 30 sec,
> >>> > pg_stat_wal showed the following on my environment:
> >>> >
> >>> > postgres(1:61569)=# select * from pg_stat_wal;
> >>> > -[ RECORD 1 ]----+-----------------------------
> >>> > wal_records      | 285947
> >>> > wal_fpi          | 53285
> >>> > wal_bytes        | 442008213
> >>> > wal_buffers_full | 0
> >>> > wal_write        | 25516
> >>> > wal_write_time   | 0
> >>> > wal_sync         | 25437
> >>> > wal_sync_time    | 14490
> >>> > stats_reset      | 2021-01-22 10:56:13.29464+09
> >>> >
> >>> > Since writes can complete less than a millisecond, wal_write_time
> >>> > didn't increase. I think sync_time could also have the same problem.
> >>>
> >>> Thanks for your comments. I didn't notice that.
> >>> I changed the unit from milliseconds to microseconds.
> >>>
> >>> > ---
> >>> > +   /*
> >>> > +    * Measure i/o timing to fsync WAL data.
> >>> > +    *
> >>> > +    * The wal receiver skip to collect it to avoid performance
> >>> > degradation of standy servers.
> >>> > +    * If sync_method doesn't have its fsync method, to skip too.
> >>> > +    */
> >>> > +   if (!AmWalReceiverProcess() && track_wal_io_timing &&
> >>> > fsyncMethodCalled())
> >>> > +       INSTR_TIME_SET_CURRENT(start);
> >>> >
> >>> > * Why does only the wal receiver skip it even if track_wal_io_timinig
> >>> > is true? I think the performance degradation is also true for backend
> >>> > processes. If there is another reason for that, I think it's better to
> >>> > mention in both the doc and comment.
> >>> > * How about checking track_wal_io_timing first?
> >>> > * s/standy/standby/
> >>>
> >>> I fixed it.
> >>> As kuroda-san mentioned too, the skip is no need to be considered.
> >>
> >> I think you also removed the code to have the wal receiver report the
> >> stats. So with the latest patch, the wal receiver tracks those
> >> statistics but doesn't report.
> >> And maybe XLogWalRcvWrite() also needs to track I/O?
> >
> > Thanks, I forgot to add them.
> > I'll fix it.
> >
> >
> >>>
> >>> > ---
> >>> > +   /* increment the i/o timing and the number of times to fsync WAL
> >>> > data */
> >>> > +   if (fsyncMethodCalled())
> >>> > +   {
> >>> > +       if (!AmWalReceiverProcess() && track_wal_io_timing)
> >>> > +       {
> >>> > +           INSTR_TIME_SET_CURRENT(duration);
> >>> > +           INSTR_TIME_SUBTRACT(duration, start);
> >>> > +           WalStats.m_wal_sync_time +=
> >>> > INSTR_TIME_GET_MILLISEC(duration);
> >>> > +       }
> >>> > +
> >>> > +       WalStats.m_wal_sync++;
> >>> > +   }
> >>> >
> >>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about
> >>> > incrementing m_wal_sync after each sync operation?
> >>>
> >>> I think if syncing the disk does not occur, m_wal_sync should not be
> >>> incremented.
> >>> It depends enableFsync and sync_method.
> >>>
> >>> enableFsync is checked in each fsync method like
> >>> pg_fsync_no_writethrough(),
> >>> so if incrementing m_wal_sync after each sync operation, it should be
> >>> implemented
> >>> in each fsync method. It leads to many duplicated codes.
> >>
> >> Right. I missed that each fsync function checks enableFsync.
> >>
> >>> So, why don't you change the function to a flag whether to
> >>> sync data to the disk will be occurred or not in issue_xlog_fsync()?
> >>
> >> Looks better. Since we don't necessarily need to increment m_wal_sync
> >> after doing fsync we can write the code without an additional variable
> >> as follows:
> >>
> >>     if (enableFsync)
> >>     {
> >>         switch (sync_method)
> >>         {
> >>             case SYNC_METHOD_FSYNC:
> >> #ifdef HAVE_FSYNC_WRITETHROUGH
> >>             case SYNC_METHOD_FSYNC_WRITETHROUGH:
> >> #endif
> >> #ifdef HAVE_FDATASYNC
> >>             case SYNC_METHOD_FDATASYNC:
> >> #endif
> >>                 WalStats.m_wal_sync++;
> >>                 if (track_wal_io_timing)
> >>                     INSTR_TIME_SET_CURRENT(start);
> >>                 break;
> >>             default:
> >>                 break;
> >>         }
> >>     }
> >>
> >>   (do fsync and error handling here)
> >>
> >>    /* increment the i/o timing and the number of times to fsync WAL
> >> data */
> >>    if (track_wal_io_timing)
> >>    {
> >>        INSTR_TIME_SET_CURRENT(duration);
> >>        INSTR_TIME_SUBTRACT(duration, start);
> >>        WalStats.m_wal_sync_time =  INSTR_TIME_GET_MICROSEC(duration);
> >>    }
> >
> > IIUC, I think we can't handle the following case.
> >
> > When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and
> > "track_wal_io_timing" is enabled, "start" doesn't be initialized.
> >
> > My understanding is something wrong, isn't it?

You're right. We might want to initialize 'start' with 0 in those two
cases and check if INSTR_TIME_IS_ZERO() later when accumulating the
I/O time.

>
> I thought the following is better.
>
>
> ```
>         /* Measure i/o timing to sync WAL data.*/
>         if (track_wal_io_timing)
>                 INSTR_TIME_SET_CURRENT(start);
>
>    (do fsync and error handling here)
>
>         /* check whether to sync WAL data to the disk right now. */
>         if (enableFsync)
>         {
>                 if ((sync_method == SYNC_METHOD_FSYNC) ||
>                         (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH) ||
>                         (sync_method == SYNC_METHOD_FDATASYNC))
>                 {
>                         /* increment the i/o timing and the number of times to fsync WAL data
> */
>                         if (track_wal_io_timing)
>                         {
>                                 instr_time      duration;
>
>                                 INSTR_TIME_SET_CURRENT(duration);
>                                 INSTR_TIME_SUBTRACT(duration, start);
>                                 WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration);
>                         }
>                         WalStats.m_wal_sync++;
>                 }
>         }
> ```
>
> Although INSTR_TIME_SET_CURRENT(start) is called everytime regardless
> of the "sync_method" and "enableFsync", we don't make additional
> variables.
> But it's ok because "track_wal_io_timing" leads already performance
> degradation.
>
> What do you think?

That also fine with me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Is Recovery actually paused?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Is it useful to record whether plans are generic or custom?