Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From Shawn Debnath
Subject Re: Refactoring the checkpointer's fsync request queue
Date
Msg-id 20190227212653.GA7171@f01898859afd.ant.amazon.com
Whole thread Raw
In response to Re: Refactoring the checkpointer's fsync request queue  (Andres Freund <andres@anarazel.de>)
Responses Re: Refactoring the checkpointer's fsync request queue
List pgsql-hackers
On Fri, Feb 22, 2019 at 03:57:42PM -0800, Andres Freund wrote:
> > Today, all the callbacks for smgr have their prototypes defined in
> > smgr.h and used in smgr.c.  Forward declarations within the new sync.h
> > would be fine but having md callbacks split in two different places may
> > not be the cleanest approach? One could argue they serve different
> > purposes so perhaps it correct that we define them separately.  I am
> > fine with either, but now I probably prefer the new enum to fixed
> > function declaration mappings that Andres suggested. I agree it would be
> > less error prone.
> 
> I'd really just entirely separate the two. I'd not include any knowledge
> of this mechanism into smgr.h, and just make the expansion of paths
> dispatch statically dispatched (potentially into md.c) to actually do
> the work.  I'm not sure why sync.h would need any forward declarations
> for that?

We had a quick offline discussion to get on the same page and we agreed 
to move forward with Andres' approach above. Attached is patch v10.  
Here's the overview of the patch:

  1. Move the system for requesting and processing fsyncs out of md.c
  into storage/sync/sync.c with definitions in include/storage/sync.h.
  ProcessSyncRequests() is now responsible for processing the sync
  requests during checkpoint.

  2. Removed the need for specific storage managers to implement pre and
  post checkpoint callbacks. These are now executed by the sync mechanism.

  3. We now embed the fork number and the segment number as part of the
  hash key for the pending ops table. This eliminates the bitmapset based
  segment tracking for each relfilenode during fsync as not all storage
  managers may map their segments from zero.

  4. Each sync request now must include a type: sync, forget, forget
  hierarchy, or unlink, and the owner who will be responsible for
  generating paths or matching forget requests.

  5. For cancelling relation sync requests, we now must send a forget
  request for each fork and segment in the relation.

  6. We do not rely on smgr to provide the file descriptor we use to
  issue fsync. Instead, we generate the full path based on the FileTag
  in the sync request and use PathNameOpenFile to get the file descriptor.


Ran make check-world and repeated the tests described in [1]. The 
numbers show a 12% drop in total time for single run of 1000 clients and 
~62% drop in total time for 10 parallel runs with 200 clients:

[Requests Absorbed]

                   Min      Max      Average    Median    Std Dev
 --------------- -------- -------- ----------- --------- ----------
  patch-1x1000    17554    147410   85252.95    83835.5   21898.88
  master-1x1000   25728    138422   81455.04    80601     21295.83

                  Min      Max      Average     Median    Std Dev
  patch-10x200    125922   279682   197098.76   197055    34038.25
  master-10x200   191833   602512   416533.86   424946    82014.48


[Files Synced]

                  Min    Max    Average   Median   Std Dev
 --------------- ------ ------ --------- -------- ---------
  patch-1x1000    155    213    158.93    159      2.97
  master-1x1000   154    166    158.29    159      10.29

                  Min    Max    Average   Median   Std Dev
  patch-10x200    1456   1577   1546.84   1547     11.23
  master-10x200   1546   1546   1546      1559     12.79


[Total Time in ProcessFsyncRequest/mdsync]

                  Min      Max     Average    Median   Std Dev
 --------------- ------ --------- ---------- -------- ---------
  patch-1x1000    606    4022      2145.11    2123     440.72
  master-1x1000   806    4430.32   2458.77    2382     497.01

                  Min    Max       Average    Median   Std Dev
  patch-10x200    2472   6960      4156.8     4141     817.56
  master-10x200   4323   17858     10982.15   11154    2760.47


[1] 
https://www.postgresql.org/message-id/20190220232739.GA8280%40f01898859afd.ant.amazon.com

-- 
Shawn Debnath
Amazon Web Services (AWS)

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: POC: converting Lists into arrays
Next
From: Tom Lane
Date:
Subject: Re: Why don't we have a small reserved OID range for patch revisions?