Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: silent data loss with ext4 / all current versions |
Date | |
Msg-id | 20160308031809.s4mkpuldd7zc4iha@alap3.anarazel.de Whole thread Raw |
In response to | Re: silent data loss with ext4 / all current versions (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: silent data loss with ext4 / all current versions
Re: silent data loss with ext4 / all current versions |
List | pgsql-hackers |
Hi, On 2016-03-08 12:01:18 +0900, Michael Paquier wrote: > I have spent a couple of hours looking at that in details, and the > patch is neat. Cool. Doing some more polishing right now. Will be back with an updated version soonish. Did you do some testing? > + * This routine ensures that, after returning, the effect of renaming file > + * persists in case of a crash. A crash while this routine is running will > + * leave you with either the old, or the new file. > "you" is not really Postgres-like, "the server" or "the backend" perhaps? Hm. I think your alternative proposals are more awkward. > + /* XXX: perform close() before? might be outside a > transaction. Consider errno! */ > ereport(elevel, > (errcode_for_file_access(), > errmsg("could not fsync file \"%s\": %m", fname))); > + (void) CloseTransientFile(fd); > + return -1; > close() should be called before. slot.c for example does so and we > don't want to link an fd here in case of elevel >= ERROR. Note that the transient file machinery will normally prevent fd leakage - but it can only do so if called in a transaction context. I've added int save_errno; /* close file upon error, might not be in transaction context */ save_errno = errno; CloseTransientFile(fd); errno = save_errno; stanzas. > + * It does so by using fsync on the sourcefile before the rename, and the > + * target file and directory after. > fsync is issued as well on the target file if it exists. I think > that's worth mentioning in the header. Ok. > + /* XXX: Add racy file existence check? */ > + if (rename(oldfile, newfile) < 0) > I am not sure we should worry about that, what do you think could > cause the old file from going missing all of a sudden. Other backend > processes are not playing with it in the code paths where this routine > is called. Perhaps adding a comment in the header to let users know > that would help? What I'm thinking of is adding a check whether the *target* file already exists, and error out in that case. Just like the link() based path normally does. > Instead of "durable" I think that "persistent" makes more sense. I find durable a lot more descriptive. persistent could refer to retrying the rename or something. > We > want to make those renames persistent on disk on case of a crash. So I > would suggest the following routine names: > - rename_persistent > - rename_or_link_persistent > Having the verb first also helps identifying that this is a > system-level, rename()-like, routine. I prefer the current names. > > I sure wish we had the recovery testing et al. in all the back > > branches... > > Sure, what we have now is something that should really be backpatched, > I was just waiting to have all the existing stability issues > addressed, the last one on my agenda being the failure of hamster for > test 005 I mentioned in another thread before sending patches for > other branches. I proposed a couple of potential regarding that > actually, see here: > http://www.postgresql.org/message-id/CAB7nPqSAZ9HnUcMoUa30JO2wJ8MnREm18p2a7McRA-ZrJxj3Vw@mail.gmail.com Yea. Will be an interesting discussion... Anyway, I did run the patch through the existing checks, after enabling fsync in PostgresNode.pm. Greetings, Andres Freund
pgsql-hackers by date: