On 2015-01-14 11:43:00 +0200, Heikki Linnakangas wrote:
> No. The question is, should pg_rewind fsync() every file that it
> modifies?
Not immediately, but before the end, yes.
> It would be a reasonable thing to do, to make sure that if you crash
> immediately after running pg_rewind, the cluster is in a consistent state.
> However, pg_basebackup for example doesn't do that. initdb does, but that
> was added fairly recently.
initdb -S can be used for this if you want to - that's why Bruce added
it. It only works correctly for tablespaces since a couple weeks
however.
> I'm not thrilled about sprinkling fsync() calls everywhere that we touch
> files. But I guess that would be the right thing to do. I'm planning to do
> that as an add-on patch later, fixing also pg_basebackup and any other
> utilities that need it.
Yea, we really need to do this. We also need it in the server, right now
there's a bunch of rather ugly corner cases where we rely on not fsynced
files being present after e.g. two consecutive crashes. Abhijit has sent
a patch.
> >+struct filemap_t
> >+{
> >+ /*
> >+ * New entries are accumulated to a linked list, in process_remote_file
> >+ * and process_local_file.
> >+ */
> >+ file_entry_t *first;
> >+ file_entry_t *last;
> >+ int nlist;
> >
> >Uh, this is like the seventh open-coded list implementation in frontend
> >code. Can't we have this using ilist for a change?
>
> ilist is backend code. I'm not eager to move it to src/common. A linked list
> is a trivial data structure, I don't think it's too bad to re-invent that
> wheel.
Not a fan. The amount of bugs in open coded list manipulations tends to
be high.
Greetings,
Andres Freund