Home > mailing lists

Re: Spreading full-page writes - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Spreading full-page writes
Date	June 3, 2014 08:12:57
Msg-id	CAA4eK1+o3rpboe37uPYVBC9bFWAhUE89dL_E058fP3LJ25w4Ww@mail.gmail.com Whole thread Raw
In response to	Re: Spreading full-page writes (Fujii Masao <masao.fujii@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Jun 2, 2014 at 6:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

> On Wed, May 28, 2014 at 1:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

>
> > IIUC in DBW mechanism, we need to have a temporary sequential
> > log file of fixed size which will be used to write data before the data
> > gets written to its actual location in tablespace. Now as the temporary
> > log file is of fixed size, the number of pages that needs to be read
> > during recovery should be less as compare to FPW because in FPW
> > it needs to read all the pages written in WAL log after last successful
> > checkpoint.
>
> Hmm... maybe I'm misunderstanding how WAL replay works in DBW case.
> Imagine the case where we try to replay two WAL records for the page A and
> the page has not been cached in shared_buffers yet. If FPW is enabled,
> the first WAL record is FPW and firstly it's just read to shared_buffers.
> The page doesn't neeed to be read from the disk. Then the second WAL record
> will be applied.
>
> OTOH, in DBW case, how does this example case work? I was thinking that
> firstly we try to apply the first WAL record but find that the page A doesn't
> exist in shared_buffers yet. We try to read the page from the disk, check
> whether its CRC is valid or not, and read the same page from double buffer
> if it's invalid. After reading the page into shared_buffers, the first WAL
> record can be applied. Then the second WAL record will be applied. Is my
> understanding right?

I think the way DBW works is that before reading WAL, it first makes

data pages consistent. It will first check the doublewrite buffer

contents and pages in their original location. If page is inconsistent

in double write buffer it is simply discarded, if it is inconsistent in

the tablespace it is recovered from double write buffer. After reaching

the double buffer end, it will start reading WAL.

So in above example case, it will read the first record from WAL

and check if page is already in shared_buffers, then apply WAL

change, else read the page into shared_buffers, then apply WAL.

For second record, it doesn't need to read the page.

The saving during recovery will come from the fact that in case

of DBW, it will not read the FPI from WAL, rather just 2 records

(it has to read a WAL page, but that will contain many records).

So it seems to be a net win.

Now incase of DBW, the extra workdone (reading the double buffer,

checking the consistency of same with actual page) is always fixed

as size of double buffer is fixed, so the impact due to it should

be much less than reading FPI's from WAL after last successful

checkpoint.

If my above understanding is right, then performance of recovery

should be better with DBW in most cases.

I think the cases where DBW might need to take care is when

there are lot of backend evictions. For such scenario's backend

might itself need to write both to double buffer and actual page.

It can have more impact during bulk reads (when it has to set hint

bit) and Vacuum which gets performed in ring buffer.

One of the improvement that can be done here is to change the buffer

eviction algorithm such that it can give up the buffer which needs

to be written to double buffer. There can be other improvements as

well depending on DBW implementation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Haribabu Kommi
Date: 03 June 2014, 07:20:45
Subject: Re: Priority table or Cache table

From: Martijn van Oosterhout
Date: 03 June 2014, 09:52:02
Subject: Re: Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?

Re: Spreading full-page writes - Mailing list pgsql-hackers

Previous

Next