Re: Is pg_control file crashsafe? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Is pg_control file crashsafe?
Date
Msg-id CAA4eK1Jkoj-8JF4NDyLvPGbFQUbxbMx=nzkc7B8VrB4nRq=s-w@mail.gmail.com
Whole thread Raw
In response to Re: Is pg_control file crashsafe?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Is pg_control file crashsafe?
Re: Is pg_control file crashsafe?
List pgsql-hackers
On Wed, May 4, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov <a.ignatov@postgrespro.ru>
> > wrote:
> >> On 03.05.2016 2:17, Tom Lane wrote:
> >>> Writing a single sector ought to be atomic too.
>
> >> pg_control is 8k long(i think it is legth of one page in default PG
> >> compile settings).
>
> > The actual data written is always sizeof(ControlFileData) which should be
> > less than one sector.
>
> Yes.  We don't care what happens to the rest of the file as long as the
> first sector's worth is updated atomically.  See the comments for
> PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
>
> We could change to a different PG_CONTROL_SIZE pretty easily, and there's
> certainly room to argue that reducing it to 512 or 1024 would be more
> efficient.  I think the motivation for setting it at 8K was basically
> "we're already assuming that 8K writes are efficient, so let's assume
> it here too".  But since the file is only written once per checkpoint,
> efficiency is not really a key selling point anyway.  If you could make
> an argument that some other size would reduce the risk of failures,
> it would be interesting --- but I suspect any such argument would be
> very dependent on the quirks of a specific file system.
>

How about using 512 bytes as a write size and perform direct writes rather than going via OS buffer cache for control file?   Alex, is the issue reproducible (to ensure that if we try to solve it in some way, do we have way to test it as well)? 
 
>
> One point worth considering is that on most file systems, rewriting
> a fraction of a page is *less* efficient than rewriting a full page,
> because the kernel first has to read in the old contents to fill
> the disk buffer it's going to partially overwrite with new data.
> This motivates against trying to reduce the write size too much.
>

Yes, you are very much right and I have observed that recently during my work on WAL Re-Writes [1].  However, I think that won't be the issue if we use direct writes for control file.


[1] - http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jBtjXBFyD67R5dLcqFyOMj4f-qmFXBP1OOQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Next
From: Tom Lane
Date:
Subject: Re: Is pg_control file crashsafe?