Thread: Optimize crash recovery

Optimize crash recovery

From
Thunder
Date:
Hello hackers:

During crash recovery, we compare most of the lsn of xlog record with page lsn to determine if the record has already been replayed.
The exceptions are full-page and init-page xlog records.
It's restored if the xlog record includes a full-page image of the page.
And it initializes the page if the xlog record include init page information.

When we enable checksum for the page and verify page success, can we compare the page lsn with the lsn of full-page xlog record or init page xlog record to detemine it  has already been replayed?

BRS
Ray


 

Re: Optimize crash recovery

From
Alvaro Herrera
Date:
On 2020-Mar-13, Thunder wrote:

> Hello hackers:
> 
> 
> During crash recovery, we compare most of the lsn of xlog record with page lsn to determine if the record has already
beenreplayed.
 
> The exceptions are full-page and init-page xlog records.
> It's restored if the xlog record includes a full-page image of the page.
> And it initializes the page if the xlog record include init page information.
> 
> 
> When we enable checksum for the page and verify page success, can we
> compare the page lsn with the lsn of full-page xlog record or init
> page xlog record to detemine it  has already been replayed?

In order to verify that the checksum passes, you have to read the page
first.  So what are you optimizing?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re:Re: Optimize crash recovery

From
Thunder
Date:
For example, if page lsn in storage is 0x90000 and start to replay from 0x10000.
If 0x10000 is full-page xlog record, then we can ignore to replay xlog between 0x10000~0x90000 for this page.

Is there any correct issue if the page exists in the buffer pool and ignore to replay for full-page or init page if page lsn is larger than the lsn of xlog record?







At 2020-03-13 23:41:03, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote: >On 2020-Mar-13, Thunder wrote: > >> Hello hackers: >> >> >> During crash recovery, we compare most of the lsn of xlog record with page lsn to determine if the record has already been replayed. >> The exceptions are full-page and init-page xlog records. >> It's restored if the xlog record includes a full-page image of the page. >> And it initializes the page if the xlog record include init page information. >> >> >> When we enable checksum for the page and verify page success, can we >> compare the page lsn with the lsn of full-page xlog record or init >> page xlog record to detemine it has already been replayed? > >In order to verify that the checksum passes, you have to read the page >first. So what are you optimizing? > >-- >Álvaro Herrera https://www.2ndQuadrant.com/ >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


 

Re: Re: Optimize crash recovery

From
Alvaro Herrera
Date:
On 2020-Mar-14, Thunder wrote:

> For example, if page lsn in storage is 0x90000 and start to replay from 0x10000.
> If 0x10000 is full-page xlog record, then we can ignore to replay xlog between 0x10000~0x90000 for this page.
> 
> 
> Is there any correct issue if the page exists in the buffer pool and
> ignore to replay for full-page or init page if page lsn is larger than
> the lsn of xlog record?

Oh! right.  The assumption, before we had page-level checksums, was that
the page at LSN 0x90000 could have been partially written, so the upper
half of the contents would actually be older and thus restoring the FPI
(and all subsequent WAL changes) was mandatory.  But if the page
checksum verifies, then there's no need to return the page back to an
old state only to replay everything to bring it to the new state again.

This seems a potentially worthwhile optimization ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: Optimize crash recovery

From
Thomas Munro
Date:
On Sat, Mar 14, 2020 at 5:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Mar-14, Thunder wrote:
> > For example, if page lsn in storage is 0x90000 and start to replay from 0x10000.
> > If 0x10000 is full-page xlog record, then we can ignore to replay xlog between 0x10000~0x90000 for this page.
> >
> > Is there any correct issue if the page exists in the buffer pool and
> > ignore to replay for full-page or init page if page lsn is larger than
> > the lsn of xlog record?
>
> Oh! right.  The assumption, before we had page-level checksums, was that
> the page at LSN 0x90000 could have been partially written, so the upper
> half of the contents would actually be older and thus restoring the FPI
> (and all subsequent WAL changes) was mandatory.  But if the page
> checksum verifies, then there's no need to return the page back to an
> old state only to replay everything to bring it to the new state again.
>
> This seems a potentially worthwhile optimization ...

One problem is that you now have to read the block from disk, which
causes an I/O stall if the page is not already in the kernel page
cache.  That could be worse than the cost of replaying all the WAL
records you get to skip with this trick.  My WAL prefetching patch[1]
could mitigate that problem to some extent, depending on how much
prefetching your system can do.  The current version of the patch has
a GUC wal_prefetch_fpw to control whether it bothers to prefetch pages
that we a FPI for, because normally there's no point, but with this
trick you'd want to turn that on.

[1] https://commitfest.postgresql.org/27/2410/