Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers
From | Shawn Debnath |
---|---|
Subject | Re: Refactoring the checkpointer's fsync request queue |
Date | |
Msg-id | 20190227212653.GA7171@f01898859afd.ant.amazon.com Whole thread Raw |
In response to | Re: Refactoring the checkpointer's fsync request queue (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Refactoring the checkpointer's fsync request queue
|
List | pgsql-hackers |
On Fri, Feb 22, 2019 at 03:57:42PM -0800, Andres Freund wrote: > > Today, all the callbacks for smgr have their prototypes defined in > > smgr.h and used in smgr.c. Forward declarations within the new sync.h > > would be fine but having md callbacks split in two different places may > > not be the cleanest approach? One could argue they serve different > > purposes so perhaps it correct that we define them separately. I am > > fine with either, but now I probably prefer the new enum to fixed > > function declaration mappings that Andres suggested. I agree it would be > > less error prone. > > I'd really just entirely separate the two. I'd not include any knowledge > of this mechanism into smgr.h, and just make the expansion of paths > dispatch statically dispatched (potentially into md.c) to actually do > the work. I'm not sure why sync.h would need any forward declarations > for that? We had a quick offline discussion to get on the same page and we agreed to move forward with Andres' approach above. Attached is patch v10. Here's the overview of the patch: 1. Move the system for requesting and processing fsyncs out of md.c into storage/sync/sync.c with definitions in include/storage/sync.h. ProcessSyncRequests() is now responsible for processing the sync requests during checkpoint. 2. Removed the need for specific storage managers to implement pre and post checkpoint callbacks. These are now executed by the sync mechanism. 3. We now embed the fork number and the segment number as part of the hash key for the pending ops table. This eliminates the bitmapset based segment tracking for each relfilenode during fsync as not all storage managers may map their segments from zero. 4. Each sync request now must include a type: sync, forget, forget hierarchy, or unlink, and the owner who will be responsible for generating paths or matching forget requests. 5. For cancelling relation sync requests, we now must send a forget request for each fork and segment in the relation. 6. We do not rely on smgr to provide the file descriptor we use to issue fsync. Instead, we generate the full path based on the FileTag in the sync request and use PathNameOpenFile to get the file descriptor. Ran make check-world and repeated the tests described in [1]. The numbers show a 12% drop in total time for single run of 1000 clients and ~62% drop in total time for 10 parallel runs with 200 clients: [Requests Absorbed] Min Max Average Median Std Dev --------------- -------- -------- ----------- --------- ---------- patch-1x1000 17554 147410 85252.95 83835.5 21898.88 master-1x1000 25728 138422 81455.04 80601 21295.83 Min Max Average Median Std Dev patch-10x200 125922 279682 197098.76 197055 34038.25 master-10x200 191833 602512 416533.86 424946 82014.48 [Files Synced] Min Max Average Median Std Dev --------------- ------ ------ --------- -------- --------- patch-1x1000 155 213 158.93 159 2.97 master-1x1000 154 166 158.29 159 10.29 Min Max Average Median Std Dev patch-10x200 1456 1577 1546.84 1547 11.23 master-10x200 1546 1546 1546 1559 12.79 [Total Time in ProcessFsyncRequest/mdsync] Min Max Average Median Std Dev --------------- ------ --------- ---------- -------- --------- patch-1x1000 606 4022 2145.11 2123 440.72 master-1x1000 806 4430.32 2458.77 2382 497.01 Min Max Average Median Std Dev patch-10x200 2472 6960 4156.8 4141 817.56 master-10x200 4323 17858 10982.15 11154 2760.47 [1] https://www.postgresql.org/message-id/20190220232739.GA8280%40f01898859afd.ant.amazon.com -- Shawn Debnath Amazon Web Services (AWS)
Attachment
pgsql-hackers by date: