Re: Is pg_control file crashsafe? - Mailing list pgsql-hackers

From Alex Ignatov
Subject Re: Is pg_control file crashsafe?
Date
Msg-id 572C5231.800@postgrespro.ru
Whole thread Raw
In response to Re: Is pg_control file crashsafe?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 05.05.2016 7:16, Amit Kapila wrote:
> On Wed, May 4, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>  >
>  > Amit Kapila <amit.kapila16@gmail.com
> <mailto:amit.kapila16@gmail.com>> writes:
>  > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov
> <a.ignatov@postgrespro.ru <mailto:a.ignatov@postgrespro.ru>>
>  > > wrote:
>  > >> On 03.05.2016 2:17, Tom Lane wrote:
>  > >>> Writing a single sector ought to be atomic too.
>  >
>  > >> pg_control is 8k long(i think it is legth of one page in default PG
>  > >> compile settings).
>  >
>  > > The actual data written is always sizeof(ControlFileData) which
> should be
>  > > less than one sector.
>  >
>  > Yes.  We don't care what happens to the rest of the file as long as the
>  > first sector's worth is updated atomically.  See the comments for
>  > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
>  >
>  > We could change to a different PG_CONTROL_SIZE pretty easily, and there's
>  > certainly room to argue that reducing it to 512 or 1024 would be more
>  > efficient.  I think the motivation for setting it at 8K was basically
>  > "we're already assuming that 8K writes are efficient, so let's assume
>  > it here too".  But since the file is only written once per checkpoint,
>  > efficiency is not really a key selling point anyway.  If you could make
>  > an argument that some other size would reduce the risk of failures,
>  > it would be interesting --- but I suspect any such argument would be
>  > very dependent on the quirks of a specific file system.
>  >
>
> How about using 512 bytes as a write size and perform direct writes
> rather than going via OS buffer cache for control file?   Alex, is the
> issue reproducible (to ensure that if we try to solve it in some way, do
> we have way to test it as well)?
>
>  >
>  > One point worth considering is that on most file systems, rewriting
>  > a fraction of a page is *less* efficient than rewriting a full page,
>  > because the kernel first has to read in the old contents to fill
>  > the disk buffer it's going to partially overwrite with new data.
>  > This motivates against trying to reduce the write size too much.
>  >
>
> Yes, you are very much right and I have observed that recently during my
> work on WAL Re-Writes [1].  However, I think that won't be the issue if
> we use direct writes for control file.
>
>
> [1] -
> http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jBtjXBFyD67R5dLcqFyOMj4f-qmFXBP1OOQ@mail.gmail.com
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>

Hi!
No issue happened only once. Also any attempts to reproduce it is not 
successful yet



pgsql-hackers by date:

Previous
From: Alex Ignatov
Date:
Subject: Re: Is pg_control file crashsafe?
Next
From: Peter Eisentraut
Date:
Subject: Re: Feature request: make cluster_name GUC useful for psql prompts