Review: compact fsync request queue on overflow - Mailing list pgsql-hackers

From Chris Browne
Subject Review: compact fsync request queue on overflow
Date
Msg-id 87y66jmkvl.fsf_-_@cbbrowne.afilias-int.info
Whole thread Raw
In response to Re: Moving test_fsync to /contrib?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Review: compact fsync request queue on overflow  (Robert Haas <robertmhaas@gmail.com>)
Re: Review: compact fsync request queue on overflow  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
I have been taking a peek at the following commitfest item:
https://commitfest.postgresql.org/action/patch_view?id=497

Submission:

- I had to trim a little off the end of the patch to apply it, but that's likely the fault of how I cut'n'pasted it. It
appliedcleanly against HEAD.
 

- I observe that it doesn't include any alterations to documentation or to regression tests.

Both aspects seem apropos, as the behaviour is entirely internal to
the backend. I wouldn't expect a GUC variable for this, or SQL
commands to control it.

Usability Review:

Does the patch actually implement that?
 - Establishes a hash table - Establishes skip slot array - Walks through all BGWriter requests   - Adds to hash
table.
     (I observe that it wasn't all that obvious that "hash_search()"     *adds* elements that are missing.  I got
confusedand went     looking for "hash_add() or similar.  It's permissible to say "dumb     Chris".)
 
   - If it's a collision, then mark collision in skip slot array, and     add to count
 - After the walk   - Clean up hash table   - If nothing found, clean up skip slot array, and return
   - If collisions found, then clear them out.
     Question: Is there any further cleanup needed for the entries     that got "dropped" out of
BGWriterShmem->requests? It seems     not, but a leak seems conceivable.
 

Do we want that?
     Eliminating a bunch of fsync() calls that are already being     induced by other backends seems like a good thing,
yep.

Do we already have it?
     Evidently not!

Does it follow SQL spec, or the community-agreed behavior?
     That doesn't seem relevant; this is well outside the scope of     what SQL spec should have to say.

Does it include pg_dump support (if applicable)?
     Definitely not applicable.

Are there dangers?
     Possibilities...
     - Mentioned in the patch is the possibility of processing the       set of requests in reverse order, which might
inprinciple       reduce work.  But there is some danger of this changing       semantics, so that reversal is not
done.
     - Concurrent access...
       Is there anything that can write extra elements to       BGWriterShmem->requests while this is running?
       I wonder if we need to have any sort of lock surrounding this?

Have all the bases been covered?
     It is a comparatively simple change, so I wouldn't think things     are missing.

Feature test:
  - Compiled and ran regression test; no problems found.
  Need to do...   - Validate it works as advertised     - Hook up pgbench     - Turn on DEBUG1 level     - Watch that
"compactedfsync request queue from %d entries to %d entries" come up
 
     It was a little troublesome inducing it.  I did so by cutting     shared memory to minimum (128kB).
     I'd regularly get entries like the following:  (Note that I     changed the error level to WARNING to induce
loggingthis without     getting all sorts of other stuff).
 

CONTEXT:  writing block 1735 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost [13] entries
CONTEXT:  writing block 14 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost [13] entries
CONTEXT:  writing block 4 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost [13] entries
CONTEXT:  writing block 6 of relation base/11933/16387
WARNING:  compacted fsync request queue from 16 entries to 3 entries - lost [13] entries
CONTEXT:  writing block 1625 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 4 entries - lost [12] entries
CONTEXT:  writing block 880 of relation base/11933/16396
WARNING:  compacted fsync request queue from 16 entries to 4 entries - lost [12] entries
CONTEXT:  writing block 133 of relation base/11933/16396
     With higher shared memory, I couldn't readily induce compaction,     which is probably a concurrency matter of not
havingenough volume     of concurrent work going on.
 
   - Corner cases?
     It's already a corner case ;-).
   - Assertion failures?
     None seen thus far.

Performance test
   - Does it slow down simple cases?
     It surely shouldn't; compaction is only considered if the fsync     queue is larger than the number of shared
buffers. That doesn't     seem like a "simple case" to me!
 
   - Does it improve performance?
     I haven't been able to induce it at a level that would make the     improvement visible.  But a database that is
busyenough to have a     'full' fsync queue should surely be helped by reducing the number     of fsync requests.
 
   - Does it slow down other things?
     In principle, the only case where it should worsen performance     is if the amount of time required to:       -
Setup a hash table       - Insert an entry for each buffer       - Walk the skip_slot array, shortening the request
queue        for each duplicate found     exceeded the amount of time required to do the duplicate fsync()
requests.
     That cost should be mighty low.  It would be interesting to     instrument CompactBgwriterRequestQueue() to see
howlong it runs.
 
     But note that this cost is also spread into a direction where it     likely wouldn't matter, as it is typically
invokedby the     background writer process, so this would frequently not be paid     by "on-line" active processes.
 

Coding review- guidelines- portability, Windows/BSD
     I can't speak to Windows' handling of fsync(), but observe that     the existing code works there, so it seems
unlikelythat a     shortening of data that presently works would cease to work.
 
- Sufficient comments?       Seems so.- Does it do what it says, accurately?       I think so.- Compiler warnings?

bgwriter.c: In function 'CompactBgwriterRequestQueue':
bgwriter.c:1142: warning: 'num_skipped' may be used uninitialized in this function
- Can I induce a crash?      By making changes to the code that corrupt things ;-).

Architecture- coherent with other things?
  Generally a minor change, which is colocated with other background  writer request code.  Seems fine in that regard.
- problematic interdependencies?
  Surely seems unlikely.

Tentative Conclusions:- Seems pretty good so far.- I'd like to see the compiler warning fixed; should be as simple as
assigningnum_skipped before it is used.
 
-- 
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/spreadsheets.html
I am not a number!
I am a free man!


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Spread checkpoint sync
Next
From: Tom Lane
Date:
Subject: Re: Replication logging