Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date
Msg-id CAD21AoB=RCYME3+iroY+8TrC9tsHTPo7kRKCpU7C_kZ8kgGW2A@mail.gmail.com
Whole thread Raw
In response to Re: About to add WAL write/fsync statistics to pg_stat_wal view  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: About to add WAL write/fsync statistics to pg_stat_wal view
List pgsql-hackers
On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Hi,
>
> I rebased the patch to the master branch.

Thank you for working on this. I've read the latest patch. Here are comments:

---
+               if (track_wal_io_timing)
+               {
+                   INSTR_TIME_SET_CURRENT(duration);
+                   INSTR_TIME_SUBTRACT(duration, start);
+                   WalStats.m_wal_write_time +=
INSTR_TIME_GET_MILLISEC(duration);
+               }

* I think it should add the time in micro sec.

After running pgbench with track_wal_io_timing = on for 30 sec,
pg_stat_wal showed the following on my environment:

postgres(1:61569)=# select * from pg_stat_wal;
-[ RECORD 1 ]----+-----------------------------
wal_records      | 285947
wal_fpi          | 53285
wal_bytes        | 442008213
wal_buffers_full | 0
wal_write        | 25516
wal_write_time   | 0
wal_sync         | 25437
wal_sync_time    | 14490
stats_reset      | 2021-01-22 10:56:13.29464+09

Since writes can complete less than a millisecond, wal_write_time
didn't increase. I think sync_time could also have the same problem.

---
+   /*
+    * Measure i/o timing to fsync WAL data.
+    *
+    * The wal receiver skip to collect it to avoid performance
degradation of standy servers.
+    * If sync_method doesn't have its fsync method, to skip too.
+    */
+   if (!AmWalReceiverProcess() && track_wal_io_timing && fsyncMethodCalled())
+       INSTR_TIME_SET_CURRENT(start);

* Why does only the wal receiver skip it even if track_wal_io_timinig
is true? I think the performance degradation is also true for backend
processes. If there is another reason for that, I think it's better to
mention in both the doc and comment.

* How about checking track_wal_io_timing first?

* s/standy/standby/

---
+   /* increment the i/o timing and the number of times to fsync WAL data */
+   if (fsyncMethodCalled())
+   {
+       if (!AmWalReceiverProcess() && track_wal_io_timing)
+       {
+           INSTR_TIME_SET_CURRENT(duration);
+           INSTR_TIME_SUBTRACT(duration, start);
+           WalStats.m_wal_sync_time += INSTR_TIME_GET_MILLISEC(duration);
+       }
+
+       WalStats.m_wal_sync++;
+   }

* I'd avoid always calling fsyncMethodCalled() in this path. How about
incrementing m_wal_sync after each sync operation?

---
+/*
+ * Check if fsync mothod is called.
+ */
+static bool
+fsyncMethodCalled()
+{
+   if (!enableFsync)
+       return false;
+
+   switch (sync_method)
+   {
+       case SYNC_METHOD_FSYNC:
+       case SYNC_METHOD_FSYNC_WRITETHROUGH:
+       case SYNC_METHOD_FDATASYNC:
+           return true;
+       default:
+           /* others don't have a specific fsync method */
+           return false;
+   }
+}

* I'm concerned that the function name could confuse the reader
because it's called even before the fsync method is called. As I
commented above, calling to fsyncMethodCalled() can be eliminated.
That way, this function is called at only once. So do we really need
this function?

* As far as I read the code, issue_xlog_fsync() seems to do fsync even
if enableFsync is false. Why does the function return false in that
case? I might be missing something.

* void is missing as argument?

* s/mothod/method/

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: POC: postgres_fdw insert batching