Home > mailing lists

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From	Shawn Debnath
Subject	Re: Refactoring the checkpointer's fsync request queue
Date	April 3, 2019 21:44:24
Msg-id	20190403214423.GA45392@f01898859afd.ant.amazon.com Whole thread Raw
In response to	Re: Refactoring the checkpointer's fsync request queue (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Refactoring the checkpointer's fsync request queue
List	pgsql-hackers

Tree view

On Thu, Apr 04, 2019 at 01:08:31AM +1300, Thomas Munro wrote:
> On Tue, Apr 2, 2019 at 11:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I'm going to do some more testing and tidying tomorrow (for example I
> > think the segment.h header is silly and I'd like to remove that), and
> > commit this.

Given the dislike in the thread for introducing  the concept of segments 
at any layer  higher than the storage manager itself, I thought it would 
be better to leave the existing header files alone and introduce a new 
one to separate the concept. I am fine either way we go.

> As a sanity check on the programming interface this thing gives you, I
> tried teaching the SLRUs to use the fsync queue.  I finished up making
> a few small improvements, but the main thing I learned is that
> "handler" needs to be part of the hash table key.  I suppose the
> discriminator could even be inside FileTag itself, but I chose to keep
> it separate and introduce a new struct to hold hander enum + FileTag
> in the hash table.

I think this is fine, but can you elaborate a bit more on why we need to 
include the handler in the key for the hash table? We are de-duping 
relfilenodes here and these should never collide with files from another 
component. The component separation would be encoded in the RelFileNode 
that the target smgr, or SLRU in this case, would be able to decipher.  
Do you foresee, or know of, use cases where FileTag alone will result in 
conflicts on the same file but from different handlers?

+/*
+ * Populate a file tag describing a segment file.  We only use the segment
+ * number, since we can derive everything else we need by having separate
+ * sync handler functions for clog, multixact etc.
+ */
+#define INIT_SLRUFILETAG(a,xx_segno) \
+( \
+    (a).rnode.spcNode = 0, \
+    (a).rnode.dbNode = 0, \
+    (a).rnode.relNode = 0, \
+    (a).forknum = 0, \
+    (a).segno = (xx_segno) \
+)

Based on the definition of INIT_SLRUFILETAG in your patch, it seems you 
are trying to only use the segno to identify the file. Not sure why we 
can't use other unused fields in FileTag to identify the component?  You 
could for example in the current SLRU implementation have the handler 
set to SYNC_HANDLER_SLRU when invoking RegisterSyncRequest() and use 
relNode to distinguish between each SLRU component in a wrapper function 
and call slrusyncfiletag() with the right SlruCtl.

For the SLRU to buffer cache work, I was planning on using the relNode 
field to identify which specific component this tag belongs to.  dbNode 
would be pointing to the type of smgr (as discussed earlier in the 
thread and still TBD).

I would prefer not to expand the hash key unnecessarily and given this 
isn't persisted, we can expand the key in the future if needed. Keeps 
the code simpler for now.

> The 0001 patch is what I'm going to commit soon.  I don't know why
> Shawn measured a performance change -- a bug in an earlier version? --
> I can't see it, but I'm planning to look into that a bit more first.
> I've attached the 0002 SLRU patch for interest, but I'm not planning
> to commit that one.

Question is which block of code did you measure? I can redo the 
instrumentation on the latest patch and re-validate and share the 
results. I previously measured the average time it took mdsync() and 
ProcessSyncRequests() in the patch to complete under similar workloads.

> I found a few more things that I thought needed adjustment:
> 
> * Packing handler and request type into a uint8 is cute but a waste of
> time if we're just going to put it in a struct next to a member that
> requires word-alignment.  So I changed it to a pair of plain old int16
> members.  The ftag member starts at offset 4 either way, on my system.

Good catch! For posterity, using packed attribute here would be bad well 
as it would result RelFileNode's spcNode in FileTag in subsequent 
entries to be misaligned given our usage of the requests as an array.  
This is potentially unsafe on platforms other than x86. Re-arranging the 
fields would lead to the same result. Thanks for catching and fixing 
this!

> * I didn't really like the use of the word HIERARCHY in the name of
> the request type, and changed it to SYNC_FILTER_REQUEST.  That word
> came up because we were implementing a kind of hierarchy, where if you
> drop a database you want to forget things for all segments inside all
> relations inside that database, but the whole point of this new API is
> that it doesn't understand that, it calls a filter function to decide
> which requests to keep.  So I preferred "filter" as a name for the
> type of request.

Yeah, I never liked the word hierarchy too much, but couldn't think of a 
better one either. Filter is perfect.

-- 
Shawn Debnath
Amazon Web Services (AWS)

pgsql-hackers by date:

From: Greg Stark
Date: 03 April 2019, 21:41:36
Subject: Re: Implementing Incremental View Maintenance

From: Alvaro Herrera
Date: 03 April 2019, 21:45:14
Subject: Re: Log a sample of transactions

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

Previous

Next