Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Refactoring the checkpointer's fsync request queue
Date
Msg-id CA+hUKGKF7QvUi2ST4R5vd_PVdcztBTXB23ZJwoj_ASoujrbsmQ@mail.gmail.com
Whole thread Raw
In response to Re: Refactoring the checkpointer's fsync request queue  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Refactoring the checkpointer's fsync request queue
List pgsql-hackers
On Thu, Feb 28, 2019 at 10:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Feb 28, 2019 at 10:27 AM Shawn Debnath <sdn@amazon.com> wrote:
> > We had a quick offline discussion to get on the same page and we agreed
> > to move forward with Andres' approach above. Attached is patch v10.
> > Here's the overview of the patch:
>
> Thanks.  I will review, and try to rebase my undo patches on top of
> this and see what problems I crash into.

Some initial feedback:

@@ -8616,7 +8617,7 @@ CreateCheckPoint(int flags)
         * the REDO pointer.  Note that smgr must not do anything that'd have to
         * be undone if we decide no checkpoint is needed.
         */
-       smgrpreckpt();
+       PreCheckpoint();

I would call this and the "post" variant something like
SyncPreCheckpoint().  Otherwise it's too general sounding and not
clear which module it's in.

+static const int NSync = lengthof(syncsw);

Unused.

+                       fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);

Needs to be closed.

+                       path = syncsw[entry->owner].sync_filepath(entry->ftag);

Probably doesn't make much difference, but wouldn't it be better for
the path to be written into a caller-supplied buffer of size
MAXPGPATH?  Then we could have that on the stack instead of alloc/free
for every path.

Hmm, mdfilepath() needs to use GetRelationPath(), and that already
returns palloc'd memory.  Oh well.

+                               entry->canceled = true;

Now that we killed the bitmapset, I wonder if we still need this
canceled flag.  What if we just removed the entry from the hash table?
If you killed the canceled flag you could then replace this:

+                       if (entry->canceled)
+                               break;

.. with another hash table probe to see if the entry went in the
AbsorbSyncRequests() call (having first copied the key into a local
variable since of course "entry" may have been freed).  Or maybe you
don't think that's better, I dunno, just an idea :-)

+ForwardSyncRequest(FileTag ftag, SyncRequestType type, SyncRequestOwner owner)

Is it a deliberate choice that you pass FileTag objects around by
value?  Rather than, say, pointer to const.  Not really a complaint in
the current coding since it's a small object anyway (not much bigger
than a pointer), but I guess someone might eventually want to make it
into a flexible sized object, or something, I dunno.

-ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum)
+ForgetRelationSyncRequests(RelFileNode rnode, ForkNumber forknum,
+                                                         SegmentNumber segno)
 {
-       if (pendingOpsTable)
+       FileTag tag;
+
+       tag.rnode = rnode;
+       tag.forknum = forknum;
+       tag.segno = segno;
+
+       if (IsSyncManagedLocally())
        {
                /* standalone backend or startup process: fsync state
is local */
-               RememberFsyncRequest(rnode, forknum, FORGET_RELATION_FSYNC);
+               RememberSyncRequest(tag, FORGET_REQUEST, SYNC_MD);
        }
...

You left this and similar functions in md.c, but I think it needs to
move out to sync.c, and just take a FileTag directly.  Otherwise I
have to write similar functions in undofile.c, and it seems kinda
weird that those modules are worrying about whether sync is managed
locally or the message needs to be sent to the checkpointer, and even
worse, they have to duplicate the loop that deals with
ForwardSyncRequest() failing and retrying.  Concretely I'm saying that
sync.c should define a function like this:

+/*
+ * PostSyncRequest
+ *
+ * Remember locally, or post to checkpointer as appropriate.
+ */
+void
+PostSyncRequest(FileTag tag, SyncRequestType type, SyncRequestOwner owner)
+{
+       if (IsSyncManagedLocally())
+       {
+               /* standalone backend or startup process: fsync state
is local */
+               RememberSyncRequest(tag, type, owner);
+       }
+       else if (IsUnderPostmaster)
+       {
+               while (!ForwardSyncRequest(tag, FORGET_REQUEST, SYNC_MD))
+                       pg_usleep(10000L);      /* 10 msec seems a
good number */
+       }
+}

Hmm, perhaps it would need to take an argument to say whether it
should keep retrying or return false if it fails; that way
register_dirty_segment() could perform the FileSync() itself if the
queue is full, but register_unlink could tell it to keep trying.  Does
this make sense?

+typedef enum syncrequesttype
+{
+       SYNC_REQUEST,
+       FORGET_REQUEST,
+       FORGET_HIERARCHY_REQUEST,
+       UNLINK_REQUEST
+} syncrequesttype;

These names are too generic for the global C namespace; how about
SYNC_REQ_CANCEL or similar?

+typedef enum syncrequestowner
+{
+       SYNC_MD = 0             /* md smgr */
+} syncrequestowner;

I have a feeling that Andres wanted to see a single enum combining
both the "operation" and the "owner", like SYNC_REQ_CANCEL_MD,
SYNC_REQ_CANCEL_UNDO, ... but I actually like it better the way you
have it.

+/* md callback forward declarations */
+extern char* mdfilepath(FileTag ftag);
+extern bool mdtagmatches(FileTag ftag, FileTag predicate,
SyncRequestType type);

It's weird that these ^ are declared in sync.h.  I think they should
be declared in a new header md.h and that should be included by
sync.c.  I know that we have this historical weird thing there md.c's
functions are declared in smgr.h, but we should eventually fix that.

More soon.

-- 
Thomas Munro
https://enterprisedb.com


pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: SQL statement PREPARE does not work in ECPG
Next
From: Etsuro Fujita
Date:
Subject: Re: Unneeded parallel safety tests in grouping_planner