Thread: Production block comparison facility

Production block comparison facility

From
Simon Riggs
Date:
The block comparison facility presented earlier by Heikki would not be
able to be used in production systems. ISTM that it would be desirable
to have something that could be used in that way.

ISTM easy to make these changes

* optionally generate a FPW for every WAL record, not just first
change after checkpoint
full_page_writes = 'always'

* when an FPW arrives, optionally run a check to see if it compares
correctly against the page already there, when running streaming
replication without a recovery target. We could skip reporting any
problems until the database is consistent
full_page_write_check = on

The above changes seem easy to implement.

With FPW compression, this would be a usable feature in production.

Comments?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Michael Paquier
Date:
On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The block comparison facility presented earlier by Heikki would not be
> able to be used in production systems. ISTM that it would be desirable
> to have something that could be used in that way.
>
> ISTM easy to make these changes
>
> * optionally generate a FPW for every WAL record, not just first
> change after checkpoint
> full_page_writes = 'always'
>
> * when an FPW arrives, optionally run a check to see if it compares
> correctly against the page already there, when running streaming
> replication without a recovery target. We could skip reporting any
> problems until the database is consistent
> full_page_write_check = on
>
> The above changes seem easy to implement.
>
> With FPW compression, this would be a usable feature in production.
>
> Comments?

This is an interesting idea, and it would be easier to use than what
has been submitted for CF1. However, full_page_writes set to "always"
would generate a large amount of WAL even for small records,
increasing I/O for the partition holding pg_xlog, and the frequency of
checkpoints run on system. Is this really something suitable for
production?
Then, looking at the code, we would need to tweak XLogInsert for the
WAL record construction to always do a FPW and to update
XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
do some extra operations in the area of
RestoreBackupBlock/RestoreBackupBlockContents, including masking
operations before comparing the content of the FPW and the current
page.

Does that sound right?
-- 
Michael



Re: Production block comparison facility

From
Simon Riggs
Date:
On 22 July 2014 08:49, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The block comparison facility presented earlier by Heikki would not be
>> able to be used in production systems. ISTM that it would be desirable
>> to have something that could be used in that way.
>>
>> ISTM easy to make these changes
>>
>> * optionally generate a FPW for every WAL record, not just first
>> change after checkpoint
>> full_page_writes = 'always'
>>
>> * when an FPW arrives, optionally run a check to see if it compares
>> correctly against the page already there, when running streaming
>> replication without a recovery target. We could skip reporting any
>> problems until the database is consistent
>> full_page_write_check = on
>>
>> The above changes seem easy to implement.
>>
>> With FPW compression, this would be a usable feature in production.
>>
>> Comments?
>
> This is an interesting idea, and it would be easier to use than what
> has been submitted for CF1. However, full_page_writes set to "always"
> would generate a large amount of WAL even for small records,
> increasing I/O for the partition holding pg_xlog, and the frequency of
> checkpoints run on system. Is this really something suitable for
> production?

For critical systems, yes, I think it is.

It would be possible to make that user selectable for particular
transactions or tables.

> Then, looking at the code, we would need to tweak XLogInsert for the
> WAL record construction to always do a FPW and to update
> XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
> do some extra operations in the area of
> RestoreBackupBlock/RestoreBackupBlockContents, including masking
> operations before comparing the content of the FPW and the current
> page.
>
> Does that sound right?

Yes, it doesn't look very much code because it fits well with existing
approaches.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Greg Stark
Date:
<p dir="ltr">If you're always going FPW then there's no point in the rest of the record. The point here was to find
problemsso that users could run normally with confidence. <p dir="ltr">The cases you might want to run in the mode you
describeare the build farm or integration testing. When treating your application on the next release of postgres it
wouldbe nice to have tests for the replication in your workload given the experience in 9.3. <p dir="ltr">Even without
theconstant full page writes a live production system could do a FPW comparison after a FPW if it was in a consistent
state.That would give standbys periodic verification at low costs.<p dir="ltr">-- <br /> greg<div
class="gmail_quote">On22 Jul 2014 12:28, "Simon Riggs" <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 22 July 2014 08:49,
MichaelPaquier <<a href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>> wrote:<br /> > On
Sun,Jul 20, 2014 at 5:31 PM, Simon Riggs <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>
wrote:<br/> >> The block comparison facility presented earlier by Heikki would not be<br /> >> able to be
usedin production systems. ISTM that it would be desirable<br /> >> to have something that could be used in that
way.<br/> >><br /> >> ISTM easy to make these changes<br /> >><br /> >> * optionally generate a
FPWfor every WAL record, not just first<br /> >> change after checkpoint<br /> >> full_page_writes =
'always'<br/> >><br /> >> * when an FPW arrives, optionally run a check to see if it compares<br />
>>correctly against the page already there, when running streaming<br /> >> replication without a recovery
target.We could skip reporting any<br /> >> problems until the database is consistent<br /> >>
full_page_write_check= on<br /> >><br /> >> The above changes seem easy to implement.<br /> >><br />
>>With FPW compression, this would be a usable feature in production.<br /> >><br /> >> Comments?<br
/>><br /> > This is an interesting idea, and it would be easier to use than what<br /> > has been submitted
forCF1. However, full_page_writes set to "always"<br /> > would generate a large amount of WAL even for small
records,<br/> > increasing I/O for the partition holding pg_xlog, and the frequency of<br /> > checkpoints run on
system.Is this really something suitable for<br /> > production?<br /><br /> For critical systems, yes, I think it
is.<br/><br /> It would be possible to make that user selectable for particular<br /> transactions or tables.<br /><br
/>> Then, looking at the code, we would need to tweak XLogInsert for the<br /> > WAL record construction to
alwaysdo a FPW and to update<br /> > XLogCheckBufferNeedsBackup. Then for the redo part, we would need to<br /> >
dosome extra operations in the area of<br /> > RestoreBackupBlock/RestoreBackupBlockContents, including masking<br
/>> operations before comparing the content of the FPW and the current<br /> > page.<br /> ><br /> > Does
thatsound right?<br /><br /> Yes, it doesn't look very much code because it fits well with existing<br />
approaches.<br/><br /> --<br />  Simon Riggs                   <a href="http://www.2ndQuadrant.com/"
target="_blank">http://www.2ndQuadrant.com/</a><br/>  PostgreSQL Development, 24x7 Support, Training & Services<br
/><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div> 

Re: Production block comparison facility

From
Simon Riggs
Date:
On 22 July 2014 12:54, Greg Stark <stark@mit.edu> wrote:
> If you're always going FPW then there's no point in the rest of the record.

I think its a simple matter to mark them XLP_BKP_REMOVABLE and to skip
any optimization of remainder of WAL records.

> The point here was to find problems so that users could run normally with
> confidence.

Yes, but a full overwrite mode would provide an even safer mode of operation.

> The cases you might want to run in the mode you describe are the build farm
> or integration testing. When treating your application on the next release
> of postgres it would be nice to have tests for the replication in your
> workload given the experience in 9.3.
>
> Even without the constant full page writes a live production system could do
> a FPW comparison after a FPW if it was in a consistent state. That would
> give standbys periodic verification at low costs.

Yes, the two options I proposed are somewhat independent of each other.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Michael Paquier
Date:



On Tue, Jul 22, 2014 at 4:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
Then, looking at the code, we would need to tweak XLogInsert for the
WAL record construction to always do a FPW and to update
XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
do some extra operations in the area of
RestoreBackupBlock/RestoreBackupBlockContents, including masking
operations before comparing the content of the FPW and the current
page.

Does that sound right?
 
I have spent some time digging more into this idea and finished with the patch attached, doing the following: addition of a consistency check when FPW is restored and applied on a given page.

The consistency check is made of two phases:
- Apply a mask on the FPW and the current page to eliminate potential conflicts like hint bits for example.
- Check that the FPW is consistent with the current page, aka the current page does not contain any new information that the FPW taken has not. This is done by checking the masked portions of the FPW and the current page.
Also some more details:
- If an inconsistency is found, a WARNING is simply logged.
- The consistency check is done if current page is not empty, and if database has reached a consistent state.
- The page masking API is taken from the WAL replay patch that was submitted in CF1 and plugged in as an independent set of API.
- In masking stuff, to facilitate if a page is used by a sequence relation SEQ_MAGIC as well as the its opaque data structure are renamed and moved into sequence.h.
- To facilitate debugging and comparison, the masked FPW and current page are also converted into hex.
Things could be refactored and improved for sure, but this patch is already useful as-is so I am going to add it to the next commit fest.

Comments are welcome.
Regards,
--
Michael
Attachment

Re: Production block comparison facility

From
Simon Riggs
Date:
On 23 July 2014 15:14, Michael Paquier <michael.paquier@gmail.com> wrote:

> I have spent some time digging more into this idea and finished with the
> patch attached

Thank you for investigating the idea. I'll review by Monday.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Michael Paquier
Date:
On Thu, Jul 24, 2014 at 12:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 23 July 2014 15:14, Michael Paquier <michael.paquier@gmail.com> wrote:
>
>> I have spent some time digging more into this idea and finished with the
>> patch attached
>
> Thank you for investigating the idea. I'll review by Monday.
OK, thanks. Here are a couple of things that are not really necessary
for the feature but I did to facilitate tests with the patch as well
as its review:
- Some information is logged to the user as DEBUG1 even if the current
page and FDW are consistent. It may be better removed.
- FPW/page consistency check is done after converting them to hex.
This is done only this way to facilitate viewing the page diffs with a
debugger. A best method would be to perform the checks using
MASK_MARKER (which should be moved to bufmask.h btw). It may be better
to put all this hex magic within a WAL_DEBUG ifdef.
Regards,
-- 
Michael



Re: Production block comparison facility

From
Andres Freund
Date:
On 2014-07-24 20:35:04 +0900, Michael Paquier wrote:
> - FPW/page consistency check is done after converting them to hex.
> This is done only this way to facilitate viewing the page diffs with a
> debugger. A best method would be to perform the checks using
> MASK_MARKER (which should be moved to bufmask.h btw). It may be better
> to put all this hex magic within a WAL_DEBUG ifdef.

Can't you just do "p/x whatever" in the debugger to display things in
hex?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Production block comparison facility

From
Michael Paquier
Date:
On Thu, Jul 24, 2014 at 8:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-07-24 20:35:04 +0900, Michael Paquier wrote:
>> - FPW/page consistency check is done after converting them to hex.
>> This is done only this way to facilitate viewing the page diffs with a
>> debugger. A best method would be to perform the checks using
>> MASK_MARKER (which should be moved to bufmask.h btw). It may be better
>> to put all this hex magic within a WAL_DEBUG ifdef.
>
> Can't you just do "p/x whatever" in the debugger to display things in
> hex?
Well yes :p
-- 
Michael



Re: Production block comparison facility

From
Heikki Linnakangas
Date:
On 07/23/2014 05:14 PM, Michael Paquier wrote:
> On Tue, Jul 22, 2014 at 4:49 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>
>> Then, looking at the code, we would need to tweak XLogInsert for the
>> WAL record construction to always do a FPW and to update
>> XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
>> do some extra operations in the area of
>> RestoreBackupBlock/RestoreBackupBlockContents, including masking
>> operations before comparing the content of the FPW and the current
>> page.
>>
>> Does that sound right?
>>
>
> I have spent some time digging more into this idea and finished with the
> patch attached, doing the following: addition of a consistency check when
> FPW is restored and applied on a given page.
>
> The consistency check is made of two phases:
> - Apply a mask on the FPW and the current page to eliminate potential
> conflicts like hint bits for example.
> - Check that the FPW is consistent with the current page, aka the current
> page does not contain any new information that the FPW taken has not. This
> is done by checking the masked portions of the FPW and the current page.
> Also some more details:
> - If an inconsistency is found, a WARNING is simply logged.
> - The consistency check is done if current page is not empty, and if
> database has reached a consistent state.
> - The page masking API is taken from the WAL replay patch that was
> submitted in CF1 and plugged in as an independent set of API.
> - In masking stuff, to facilitate if a page is used by a sequence relation
> SEQ_MAGIC as well as the its opaque data structure are renamed and moved
> into sequence.h.
> - To facilitate debugging and comparison, the masked FPW and current page
> are also converted into hex.
> Things could be refactored and improved for sure, but this patch is already
> useful as-is so I am going to add it to the next commit fest.

I don't understand how this works. A full-page image contains the new 
page contents *after* the WAL-logged operation. For example, in a heap 
insert, the full-page image contains the new tuple. How can you compare 
that with what's on the disk already?

ISTM you'd need to log two full-page images for every WAL record. A 
before image and an after image. Then you could do a lot of checking:

1. the before image should match what's on disk already
2. the result after applying the WAL record should match the after image.

That would be more handy than the approach I used, where the page images 
are logged to a separate file. You wouldn't need to deal with any new 
files, as all the data is in the WAL. Verification would be done 
directly in the standby, with no need to run any extra programs.

- Heikki




Re: Production block comparison facility

From
Michael Paquier
Date:
On Tue, Jul 29, 2014 at 7:30 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I don't understand how this works. A full-page image contains the new page
> contents *after* the WAL-logged operation. For example, in a heap insert,
> the full-page image contains the new tuple. How can you compare that with
> what's on the disk already?

An exact match of the FPW and the current page is not done, the patch
as it stands now checks if a FPW is consistent with the content of
current page by checking if it does not include changes that diverge
from what the FPW has.
For example for a heap insert, if current page has N records
pointer1/tup1..pointerN/tupN, FPW should only contain (N+1) records
pointer1/tup1..pointer(N+1)/tup(N+1). After applying the mask at block
recovery, process simply checks that the FPW and current page contain
the first N records, marking FPW and current page as inconsistent if
the current page has some garbage like some extra tuple entries not in
the FPW. I am sure you have arguments against that though...

> ISTM you'd need to log two full-page images for every WAL record. A before
> image and an after image.
The after image is the current FPW, so there is nothing else to do for
it. But for the before buffer, what do you think about using
ReadBufferExtended with RBM_NORMAL? We could grab its content from
disk in XLogInsert only when we are sure that a backup block is added.

> Then you could do a lot of checking:
> 1. the before image should match what's on disk already
> 2. the result after applying the WAL record should match the after image.
A WAL record can contain up to XLR_MAX_BKP_BLOCKS backup blocks.
Should we double it from 4 to 8?

> That would be more handy than the approach I used, where the page images are
> logged to a separate file. You wouldn't need to deal with any new files, as
> all the data is in the WAL. Verification would be done directly in the
> standby, with no need to run any extra programs.
In this case, would it better to control that with a GUC? Making that
the default will increase the amount of WAL for all types of
applications, except if couple with FPW compression...
Regards,
-- 
Michael



Re: Production block comparison facility

From
Simon Riggs
Date:
On 29 July 2014 11:30, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> I don't understand how this works. A full-page image contains the new page
> contents *after* the WAL-logged operation. For example, in a heap insert,
> the full-page image contains the new tuple. How can you compare that with
> what's on the disk already?
>
> ISTM you'd need to log two full-page images for every WAL record. A before
> image and an after image. Then you could do a lot of checking:
>
> 1. the before image should match what's on disk already
> 2. the result after applying the WAL record should match the after image.
>
> That would be more handy than the approach I used, where the page images are
> logged to a separate file. You wouldn't need to deal with any new files, as
> all the data is in the WAL. Verification would be done directly in the
> standby, with no need to run any extra programs.

It doesn't matter whether we take a before or after image of the page.

What is important is that we make the check on the standby at the same
point as the full page was taken on the master. After all, the pages
are marked as removable.

Given the pages are after images, then we just make the check after
applying WAL.

So I don't see the need for two full page images.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Michael Paquier
Date:
On Thu, Jul 31, 2014 at 2:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 29 July 2014 11:30, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
>> I don't understand how this works. A full-page image contains the new page
>> contents *after* the WAL-logged operation. For example, in a heap insert,
>> the full-page image contains the new tuple. How can you compare that with
>> what's on the disk already?
>>
>> ISTM you'd need to log two full-page images for every WAL record. A before
>> image and an after image. Then you could do a lot of checking:
>>
>> 1. the before image should match what's on disk already
>> 2. the result after applying the WAL record should match the after image.
>>
>> That would be more handy than the approach I used, where the page images are
>> logged to a separate file. You wouldn't need to deal with any new files, as
>> all the data is in the WAL. Verification would be done directly in the
>> standby, with no need to run any extra programs.
>
> It doesn't matter whether we take a before or after image of the page.
>
> What is important is that we make the check on the standby at the same
> point as the full page was taken on the master. After all, the pages
> are marked as removable.
>
> Given the pages are after images, then we just make the check after
> applying WAL.
>
> So I don't see the need for two full page images.
By doing so you definitely need an additional mode for full-page
writes: one certifying that process does not apply this FPW because it
wants to compare it to current page after applying the WALs. This
increases the footprint of the feature on code because all the code
paths where RestoreBackupBlock is called need to be bypassed.
-- 
Michael



Re: Production block comparison facility

From
Simon Riggs
Date:
On 31 July 2014 07:45, Michael Paquier <michael.paquier@gmail.com> wrote:

>> So I don't see the need for two full page images.

> By doing so you definitely need an additional mode for full-page
> writes: one certifying that process does not apply this FPW because it
> wants to compare it to current page after applying the WALs. This
> increases the footprint of the feature on code because all the code
> paths where RestoreBackupBlock is called need to be bypassed.

Yeh, it looks like you need to do CheckBackupBlock() exactly as many
times as you do RestoreBackupBlock(), with the sequence of actions
being RestoreBackupBlock(), apply WAL then CheckBackupBlock(). That
will work without much code churn, it will be just a one line addition
in a few dozen places.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Production block comparison facility

From
Michael Paquier
Date:
On Thu, Jul 31, 2014 at 4:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Yeh, it looks like you need to do CheckBackupBlock() exactly as many
> times as you do RestoreBackupBlock(), with the sequence of actions
> being RestoreBackupBlock(), apply WAL then CheckBackupBlock(). That
> will work without much code churn, it will be just a one line addition
> in a few dozen places.
Additionally, as this is a recovery-only feature, I was thinking that
it would be better to control it with a parameter of recovery.conf.
Let's call it check_full_page_writes for example. Thoughts?
-- 
Michael



Re: Production block comparison facility

From
Michael Paquier
Date:
On Wed, Jul 23, 2014 at 11:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Things could be refactored and improved for sure, but this patch is already
> useful as-is so I am going to add it to the next commit fest.

After some more investigation, I am going to mark this patch as
"Returned with feedback" for the time being (mainly to let it show up
on the commit fest app and for the sake of archives), Mainly for two
reasons:
- We can do better than what I sent: instead of checking if the FPW
and the current page are somewhat consistent, we could actually check
if the current page is equal with the FPW after applying WAL on it. In
order to do that, we would need to bypass the FPW replay and to apply
WAL on the current page (if the page is already initialized), then
control RestoreBackupBlock (or its equivalent) that with an additional
flag to tell that block is "not restored, but can get WAL applied to
it safely". Then a comparison with the FPW contained in the WAL record
can be made.
- The patch of Heikki to change the WAL APIs and track more easily the
blocks changes is going to make this implementation far easier. It
also improves the status checks on which block has been restored, so
it is more easily extensible for what could be done here.

Regards,
-- 
Michael