On 2016-04-28 17:41:29 +0100, Thom Brown wrote:
> I've noticed another breakage, which I can reproduce consistently.
> 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not
> fsync file "base/24581/24594.1" but retrying: No such file or directory
> 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not
> fsync file "base/24581/24594.1": No such file or directory
> 2016-04-28 17:36:08 BST [18605]: [17-1]
> user=thom,db=postgres,client=[local] ERROR: checkpoint request failed
> 2016-04-28 17:36:08 BST [18605]: [18-1]
> user=thom,db=postgres,client=[local] HINT: Consult recent messages in the
> server log for details.
Yuck. md.c is so crummy :(
Basically the reason for the problem is that mdsync() needs to access
"formally non-existant segments" (as in ones where previous segments are
< RELSEG_SIZE), because we queue (and the might be preexistant) fsync
requests via register_dirty_segment() in mdtruncate().
I'm a bit of a loss of how to reconcile that view with the original
issue in this thread. The best I can come up with this moment is doing
a _mdfd_openseg() in mdsync() to open the truncated segment if
_mdfd_getseg() returned NULL. We don't want to normally use that in
either function because it'll imply a separate open() etc, which is
pretty expensive - but doing in the fallback case would be kind of ok.
Andres