Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools? - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
Date
Msg-id 9CCDF037-BB5F-4369-AE12-37B25C20B0EF@gmail.com
Whole thread Raw
In response to RE: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?  (HECTOR INGERTO <hector_25e@hotmail.com>)
List pgsql-general
> On 16 Jan 2023, at 15:37, HECTOR INGERTO <HECTOR_25E@hotmail.com> wrote:
>
> > The database relies on the data being consistent when it performs crash recovery.
> > Imagine that a checkpoint is running while you take your snapshot.  The checkpoint
> > syncs a data file with a new row to disk.  Then it writes a WAL record and updates
> > the control file.  Now imagine that the table with the new row is on a different
> > file system, and your snapshot captures the WAL and the control file, but not
> > the new row (it was still sitting in the kernel page cache when the snapshot was taken).
> > You end up with a lost row.
> >
> > That is only one scenario.  Many other ways of corruption can happen.
>
> Can we say then that the risk comes only from the possibility of a checkpoint running inside the time gap between the
non-simultaneoussnapshots? 

I recently followed a course on distributed algorithms and recognised one of the patterns here.

The problem boils down to a distributed snapshotting algorithm, where both ZFS filesystem processes each initiate their
ownsnapshot independently. 

Without communicating with each other and with the database which messages (in this case traffic to and from the
databaseto each FS) are part of their snapshots (sent or received), there are chances of lost messages, where either
noneof the process snapshots know that a 'message' was sent or none received it. 

Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding communication between those processes about
messagesin transit. 

Alban Hertroys
--
There is always an exception to always.







pgsql-general by date:

Previous
From: pran d
Date:
Subject: pg_stat_all_tables: n_live_tup column value not persisting
Next
From: Magnus Hagander
Date:
Subject: Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?