Thread: pgsql: Recursively fsync() the data directory after a crash.
Recursively fsync() the data directory after a crash. Otherwise, if there's another crash, some writes from after the first crash might make it to disk while writes from before the crash fail to make it to disk. This could lead to data corruption. Back-patch to all supported versions. Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised by me. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/2ce439f3379aed857517c8ce207485655000fc8e Modified Files -------------- src/backend/access/transam/xlog.c | 42 ++++++++++++++ src/backend/storage/file/fd.c | 115 +++++++++++++++++++++++++++++++++++++ src/include/storage/fd.h | 2 + 3 files changed, 159 insertions(+)
On 05/04/2015 02:23 PM, Robert Haas wrote: > Recursively fsync() the data directory after a crash. > > Otherwise, if there's another crash, some writes from after the first > crash might make it to disk while writes from before the crash fail > to make it to disk. This could lead to data corruption. > > Back-patch to all supported versions. > This appears to have broken Windows builds. At least it's broken by using the two argument form of open instead of the three argument form, for which we have a #define in the win32 case, and pg_win32_is_junction is a typo - the first _ should not be there. cheers andrew
On Mon, May 4, 2015 at 11:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 05/04/2015 02:23 PM, Robert Haas wrote: >> Recursively fsync() the data directory after a crash. >> >> Otherwise, if there's another crash, some writes from after the first >> crash might make it to disk while writes from before the crash fail >> to make it to disk. This could lead to data corruption. >> >> Back-patch to all supported versions. >> > > This appears to have broken Windows builds. At least it's broken by using > the two argument form of open instead of the three argument form, for which > we have a #define in the win32 case, and pg_win32_is_junction is a typo - > the first _ should not be there. Sorry about that. Working on it now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/4/15 2:23 PM, Robert Haas wrote: > Recursively fsync() the data directory after a crash. > > Otherwise, if there's another crash, some writes from after the first > crash might make it to disk while writes from before the crash fail > to make it to disk. This could lead to data corruption. If there a reason why in pre_sync_fname(), the error message does not contain a %m?
On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 5/4/15 2:23 PM, Robert Haas wrote: >> Recursively fsync() the data directory after a crash. >> >> Otherwise, if there's another crash, some writes from after the first >> crash might make it to disk while writes from before the crash fail >> to make it to disk. This could lead to data corruption. > > If there a reason why in pre_sync_fname(), the error message does not > contain a %m? For consistency with the rest of the file, I suppose. Not sure why it's like that, but all the functions in the file do it that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> If there a reason why in pre_sync_fname(), the error message does not >> contain a %m? > For consistency with the rest of the file, I suppose. Not sure why > it's like that, but all the functions in the file do it that way. Huh? All the ones that are reporting kernel call failures certainly have %m. regards, tom lane
On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >>> If there a reason why in pre_sync_fname(), the error message does not >>> contain a %m? > >> For consistency with the rest of the file, I suppose. Not sure why >> it's like that, but all the functions in the file do it that way. > > Huh? All the ones that are reporting kernel call failures certainly > have %m. Oops, you're right. I was looking at the wrong code. Yeah, that should probably be fixed. I'm not sure it's a good idea to do that right now though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Huh? All the ones that are reporting kernel call failures certainly >> have %m. > Oops, you're right. I was looking at the wrong code. Yeah, that > should probably be fixed. I'm not sure it's a good idea to do that > right now though. It's a trivial enough change that I wouldn't see a problem with doing it now. But if you do, please get it in by say 2PM EDT. regards, tom lane
On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Huh? All the ones that are reporting kernel call failures certainly > >> have %m. > > > Oops, you're right. I was looking at the wrong code. Yeah, that > > should probably be fixed. I'm not sure it's a good idea to do that > > right now though. > > It's a trivial enough change that I wouldn't see a problem with doing it > now. But if you do, please get it in by say 2PM EDT. Uh, doesn't that change the translated strings? Is that a problem? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote: >> It's a trivial enough change that I wouldn't see a problem with doing it >> now. But if you do, please get it in by say 2PM EDT. > Uh, doesn't that change the translated strings? Is that a problem? Better an untranslated message than a translated one that lacks critical info. But anyway, there are likely other instances of that same string *with* %m ... regards, tom lane
On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote: >>> It's a trivial enough change that I wouldn't see a problem with doing it >>> now. But if you do, please get it in by say 2PM EDT. > >> Uh, doesn't that change the translated strings? Is that a problem? > > Better an untranslated message than a translated one that lacks critical > info. But anyway, there are likely other instances of that same string > *with* %m ... Probably not, because of the way the message is worded. But we could do this: diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 6fa75d1..bed8478 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir) if (fd < 0) ereport(FATAL, - (errmsg("could not open file \"%s\" before fsync", + (errmsg("could not open file \"%s\": %m", fname))); pg_flush_data(fd, 0, 0); Does that sound good? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Better an untranslated message than a translated one that lacks critical >> info. But anyway, there are likely other instances of that same string >> *with* %m ... > Probably not, because of the way the message is worded. But we could do this: > diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c > index 6fa75d1..bed8478 100644 > --- a/src/backend/storage/file/fd.c > +++ b/src/backend/storage/file/fd.c > @@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir) > if (fd < 0) > ereport(FATAL, > - (errmsg("could not open file \"%s\" > before fsync", > + (errmsg("could not open file \"%s\": %m", > fname))); > pg_flush_data(fd, 0, 0); > Does that sound good? Works for me. The file/line info for the message would provide the "before fsync" context anyway, if someone needed it. regards, tom lane
Robert Haas wrote: > On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Uh, doesn't that change the translated strings? Is that a problem? > > > > Better an untranslated message than a translated one that lacks critical > > info. But anyway, there are likely other instances of that same string > > *with* %m ... > > Probably not, because of the way the message is worded. But we could do this: > > diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c > index 6fa75d1..bed8478 100644 > --- a/src/backend/storage/file/fd.c > +++ b/src/backend/storage/file/fd.c > @@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir) > > if (fd < 0) > ereport(FATAL, > - (errmsg("could not open file \"%s\" > before fsync", > + (errmsg("could not open file \"%s\": %m", > fname))); > > pg_flush_data(fd, 0, 0); > > Does that sound good? +1. It's not like the extra two words of context would be of much use anyway. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, May 18, 2015 at 1:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> Uh, doesn't that change the translated strings? Is that a problem? >> > >> > Better an untranslated message than a translated one that lacks critical >> > info. But anyway, there are likely other instances of that same string >> > *with* %m ... >> >> Probably not, because of the way the message is worded. But we could do this: >> >> diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c >> index 6fa75d1..bed8478 100644 >> --- a/src/backend/storage/file/fd.c >> +++ b/src/backend/storage/file/fd.c >> @@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir) >> >> if (fd < 0) >> ereport(FATAL, >> - (errmsg("could not open file \"%s\" >> before fsync", >> + (errmsg("could not open file \"%s\": %m", >> fname))); >> >> pg_flush_data(fd, 0, 0); >> >> Does that sound good? > > +1. It's not like the extra two words of context would be of much use > anyway. Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company