Re: [HACKERS] wait events for disk I/O - Mailing list pgsql-hackers

From Rushabh Lathia
Subject Re: [HACKERS] wait events for disk I/O
Date
Msg-id CAGPqQf2-CYrM+dxPoge5Gm_G+QLq-PObbezWyjt=Vs2RsspnMA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] wait events for disk I/O  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] wait events for disk I/O  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Sat, Mar 4, 2017 at 7:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> My colleague Rahila reported compilation issue with
> the patch. Issue was only coming with we do the clean
> build on the branch.
>
> Fixed the same into latest version of patch.
>

Few assorted comments:

1.
+        <row>
+         <entry><literal>WriteBuffile</></entry>
+         <entry>Waiting during
buffile read operation.</entry>
+        </row>

Operation name and definition are not matching.


Will fix this.
 

2.
+FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info)
 {
 #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
  int returnCode;
@@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount)
  if (returnCode < 0)
  return returnCode;

+ pgstat_report_wait_start(wait_event_info);
  returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
    POSIX_FADV_WILLNEED);
+ pgstat_report_wait_end();

AFAIK, this call is non-blocking and will just initiate a read, so do
you think we should record wait event for such a call.


Yes, prefecth call is a non-blocking and will just initiate a read. But this info
about the prefetch into wait events will give more info about the system.

3.
- written = FileWrite(src->vfd, waldata_start, len);
+ written = FileWrite(src->vfd, waldata_start, len,
+ WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK);
  if (written != len)
  ereport(ERROR,
  (errcode_for_file_access(),
@@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state)
  hash_seq_init(&seq_status, state->rs_logical_mappings);
  while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != NULL)
  {
- if (FileSync(src->vfd) != 0)
+ if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0)


Do we want to consider recording wait event for both write and sync?
It seems to me OS level writes are relatively cheap and sync calls are
expensive, so shouldn't we just log for sync calls?  I could see the
similar usage at multiple places in the patch.


Yes, I thought of adding wait event only for the sync but then recording the
wait event for both write and sync. I understand that OS level writes are
cheap but it still have some cost attached to that. Also I thought for the
monitoring tool being develop using this wait events, will have more useful
capture data if we try to collect as much info as we can. Or may be not.

I am open for other opinion/suggestions.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



--
Rushabh Lathia

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] Partitioned tables and relfilenode
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partitioned tables and relfilenode