Home > mailing lists

Re: Is pg_control file crashsafe? - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Is pg_control file crashsafe?
Date	May 5, 2016 07:17:47
Msg-id	CAA4eK1Kn62ZHHCsUrq+X_31t9wNfxNAW5TwbkWCvmJafTXQA=A@mail.gmail.com Whole thread
In response to	Re: Is pg_control file crashsafe? (Thomas Munro <thomas.munro@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Thu, May 5, 2016 at 11:52 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Thu, May 5, 2016 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> >> How about using 512 bytes as a write size and perform direct writes rather
> >> than going via OS buffer cache for control file?
> >
> > Wouldn't that fail outright under a lot of implementations of direct write;
> > ie the request needs to be page-aligned, for some not-very-determinate
> > value of page size?
> >

Right, it should be atleast page size.

>
> > To repeat, I'm pretty hesitant to change this logic. While this is not
> > the first report we've ever heard of loss of pg_control, I believe I could
> > count those reports without running out of fingers on one hand --- and
> > that's counting since the last century. It will take quite a lot of
> > evidence to convince me that some other implementation will be more
> > reliable. If you just come and present a patch to use direct write, or
> > rename, or anything else for that matter, I'm going to reject it out of
> > hand unless you provide very strong evidence that it's going to be more
> > reliable than the current code across all the systems we support.
>
> I'm not sure how those ideas address the reported problem anyway: the
> *length* was unexpectedly zero after a crash. UpdateControlFile
> doesn't change the length of the control file, since it doesn't
> specify O_TRUNC or O_APPEND and it always writes the same size. So it
> seems like a pretty weird failure mode affecting filesystem metadata
> (which I wouldn't expect to change anyway, but I would expect to be
> journaled if it did), not a file-contents-atomicity problem. Whether
> or not the page cache is involved in a write to a preallocated file
> doesn't seem relevant to a case of unexpected truncation, and the
> atomic rename trick doesn't seem relevant either unless someone with
> expert knowledge of NTFS could explain how a crash could lead to
> truncation in the first place, and how rename would help.
>

I think the real reason for truncation is not known or not discussed here. It seems to me that the ideas are being discussed on the mere speculation that current way of writing can lead to corruption in certain cases. I think it would be better to first dig into the actual reason of problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Amit Kapila
Date: 05 May 2016, 06:30:05
Subject: Re: Segmentation fault when max_parallel degree is very High

From: Fabien COELHO
Date: 05 May 2016, 07:33:00
Subject: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions

Re: Is pg_control file crashsafe? - Mailing list pgsql-hackers

Previous

Next