Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: silent data loss with ext4 / all current versions
Date
Msg-id 56584DFA.6090101@sigaev.ru
Whole thread Raw
In response to silent data loss with ext4 / all current versions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
> What happens is that when we recycle WAL segments, we rename them and then sync
> them using fdatasync (which is the default on Linux). However fdatasync does not
> force fsync on the parent directory, so in case of power failure the rename may
> get lost. The recovery won't realize those segments actually contain changes
Agree. Some time ago I faced with this, although it wasn't a postgres.

> So, what's going on? The problem is that while the rename() is atomic, it's not
> guaranteed to be durable without an explicit fsync on the parent directory. And
> by default we only do fdatasync on the recycled segments, which may not force
> fsync on the directory (and ext4 does not do that, apparently).
>
> This impacts all current kernels (tested on 2.6.32.68, 4.0.5 and 4.4-rc1), and
> also all supported PostgreSQL versions (tested on 9.1.19, but I believe all
> versions since spread checkpoints were introduced are vulnerable).
>
> FWIW this has nothing to do with storage reliability - you may have good drives,
> RAID controller with BBU, reliable SSDs or whatever, and you're still not safe.
> This issue is at the filesystem level, not storage.
Agree again.

> I plan to do more power failure testing soon, with more complex test scenarios.
> I suspect there might be other similar issues (e.g. when we rename a file before
> a checkpoint and don't fsync the directory - then the rename won't be replayed
> and will be lost).
It would be very useful, but I hope you will not find a new bug :)

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: Tsvector editing functions
Next
From: Michael Paquier
Date:
Subject: Re: silent data loss with ext4 / all current versions