Thread: postgres on a non-journaling filesystem

postgres on a non-journaling filesystem

From
maayan mordehai
Date:
hello,

I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
In the tip box that, it's better not to use a  journaling filesystem. and I wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a crash, and it didn't finish. 
I'm assuming it will detect it when we will start postgres and write that it was rolled back, am I right?
and how does it work in the data level? if some of the 8k block is written but not all of it, and then there is a crash, how postgres deals with it?

Thanks in advance 

Re: postgres on a non-journaling filesystem

From
Heikki Linnakangas
Date:
On 23/01/2019 01:03, maayan mordehai wrote:
> hello,
> 
> I'm Maayan, I'm in a DBA team that uses postgresql.
> I saw in the documentation on wals:
> https://www.postgresql.org/docs/10/wal-intro.html
> In the tip box that, it's better not to use a  journaling filesystem. and I
> wanted to ask how it works?
> can't we get corruption that we can't recover from?
> I mean what if postgres in the middle of a write to a wal and there is a
> crash, and it didn't finish.
> I'm assuming it will detect it when we will start postgres and write that
> it was rolled back, am I right?

Yep, any half-written transactions will be rolled back.

> and how does it work in the data level? if some of the 8k block is written
> but not all of it, and then there is a crash, how postgres deals with it?

The first time a block is modified after a checkpoint, a copy of the 
block is written to the WAL. At crash recovery, the block is restored 
from the WAL. This mechanism is called "full page writes".

The WAL works just like the journal in a journaling filesystem. That's 
why it's not necessary to have journaling at the filesystem level.

- Heikki


Re: postgres on a non-journaling filesystem

From
maayan mordehai
Date:
Thank you!!

On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas <hlinnaka@iki.fi wrote:
On 23/01/2019 01:03, maayan mordehai wrote:
> hello,
>
> I'm Maayan, I'm in a DBA team that uses postgresql.
> I saw in the documentation on wals:
> https://www.postgresql.org/docs/10/wal-intro.html
> In the tip box that, it's better not to use a  journaling filesystem. and I
> wanted to ask how it works?
> can't we get corruption that we can't recover from?
> I mean what if postgres in the middle of a write to a wal and there is a
> crash, and it didn't finish.
> I'm assuming it will detect it when we will start postgres and write that
> it was rolled back, am I right?

Yep, any half-written transactions will be rolled back.

> and how does it work in the data level? if some of the 8k block is written
> but not all of it, and then there is a crash, how postgres deals with it?

The first time a block is modified after a checkpoint, a copy of the
block is written to the WAL. At crash recovery, the block is restored
from the WAL. This mechanism is called "full page writes".

The WAL works just like the journal in a journaling filesystem. That's
why it's not necessary to have journaling at the filesystem level.

- Heikki

Re: postgres on a non-journaling filesystem

From
Andres Freund
Date:
On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote:
> On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> > 
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a  journaling filesystem. and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
> 
> Yep, any half-written transactions will be rolled back.
> 
> > and how does it work in the data level? if some of the 8k block is written
> > but not all of it, and then there is a crash, how postgres deals with it?
> 
> The first time a block is modified after a checkpoint, a copy of the block
> is written to the WAL. At crash recovery, the block is restored from the
> WAL. This mechanism is called "full page writes".
> 
> The WAL works just like the journal in a journaling filesystem. That's why
> it's not necessary to have journaling at the filesystem level.

But note not having journaling on the FS level often makes OS start
after a crash *painfully* slow, because fsck or similar will be run. And
that's often necessary for the internal FS consistency.

Note that even with journaling enabled, most filesystem by default don't
journal data, so you can get those partial writes anyway.

Greetings,

Andres Freund