Thread: [HACKERS] Adding hook in BufferSync for backup purposes

[HACKERS] Adding hook in BufferSync for backup purposes

From
Андрей Бородин
Date:
Hi, hackers!
 
I want to propose adding hook in BufferSync.
 
==Why==
So that extensions could track pages changed between checkpoints. I think this can allow efficient differential backups taken right after checkpoint. And this functionality can be implemented as an extension.
 
==What==
I propose to add hook inside BufferSync() function it inform extensions that we are going to write pages to disk. Please see patch attached. I pass a timestamp of the checkpoint, but it would be good if we could also pass there number of checkpoint or something like this to ensure that some checkpoints were not lost (this could yield malformed backups).
 
==State==
This is just an idea to discuss, I could not find something like this in pgsql-hackers as for now. Neither I could find similar hooks in the code.
Is this hook sufficient to implement page tracking for differential backups? I’m not sure, but seems like it is.
 
==Questions==
Is this call enough to gather information about changed pages for backup purposes?
Can we call extensions reliably from checkpointer process?
Can we guaranty that extension won’t miss our call or we will defer BufferSync if extension have failed?
Can we provide more flexibility for this hook?
 
Any thought will be appreciated.
 
Best regards, Andrey Borodin, Yandex.
Attachment

Re: [HACKERS] Adding hook in BufferSync for backup purposes

From
Alvaro Herrera
Date:
Андрей Бородин wrote:

> ==What==
> I propose to add hook inside BufferSync() function it inform extensions that we
> are going to write pages to disk. Please see patch attached. I pass a timestamp
> of the checkpoint, but it would be good if we could also pass there number of
> checkpoint or something like this to ensure that some checkpoints were not lost
> (this could yield malformed backups).
>  
> ==State==
> This is just an idea to discuss, I could not find something like this in
> pgsql-hackers as for now. Neither I could find similar hooks in the code.
> Is this hook sufficient to implement page tracking for differential backups?
> I’m not sure, but seems like it is.

Hi,

I remember discussing the topic of differential base-backups with
somebody (probably Marco and Gabriele).  The idea we had was to have a
new relation fork which stores an LSN for each group of pages,
indicating the LSN of the newest change to those pages.  The backup tool
"scans" the whole LSN fork, and grabs images of all pages that have LSNs
newer than the one used for the previous base backup.

(I think your sketch above should use LSNs rather than timestamps).

I suppose your hook idea lets you implement the LSN fork in an
extension, rather than having it be part of core.  The idea of hooking
onto BufferSync makes me uneasy, though -- like it's not the correct
place to do it.  I think it should be at the point where the buffer is
modified (i.e. when WAL is written) rather than when it's checkpointed
out.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Adding hook in BufferSync for backup purposes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I suppose your hook idea lets you implement the LSN fork in an
> extension, rather than having it be part of core.  The idea of hooking
> onto BufferSync makes me uneasy, though -- like it's not the correct
> place to do it.

Yeah.  Keep in mind that if the extension does anything at all that could
possibly throw an error, and if that error condition persists across
multiple tries, you will have broken the database completely: it will
be impossible to complete a checkpoint, and your WAL segment pool will
grow until it exhausts disk.  So the idea of doing something that involves
unspecified extension behavior, especially possible interaction with
an external backup agent, right there is pretty terrifying.

Other problems with the proposed patch: it misses coverage of
BgBufferSync, and I don't like exposing an ad-hoc structure like
CkptTsStatus as part of an extension API.  The algorithm used by
BufferSync to schedule buffer writes has changed multiple times
before and doubtless will again; if we're going to have a hook
here it should depend as little as possible on those details.
        regards, tom lane



Re: [HACKERS] Adding hook in BufferSync for backup purposes

From
Andrey Borodin
Date:
Alvaro, Tom, thank you for your valuable comments.


> Alvaro:
> I remember discussing the topic of differential base-backups with
> somebody (probably Marco and Gabriele).  The idea we had was to have a
> new relation fork which stores an LSN for each group of pages,
> indicating the LSN of the newest change to those pages.  The backup tool
> "scans" the whole LSN fork, and grabs images of all pages that have LSNs
> newer than the one used for the previous base backup.
Thanks for the pointer, I’ve found the discussions and now I’m in a process of extraction of the knowledge from there

> I think it should be at the point where the buffer is
> modified (i.e. when WAL is written) rather than when it's checkpointed
> out.
WAL is flushed before actual pages are written to disk(sent to kernel). I’d like to notify extensions right after we
exactlysure pages were flushed. 
But you are right, BufferSync is not good place for this:
1. It lacks LSNs
2. It’s not the only place to flush: bgwriter goes through nearby function FlushBuffer() and many AMs write directly to
smgr(for example when matapge is created) 

BufferSync() seemed sooo comfortable and efficient place for flashing info on dirty pages, already sorted and grouped
bytablespace, but it is absolutely incorrect to do it there. I’ll look for the better place. 

>
> 7 авг. 2017 г., в 18:37, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> Yeah.  Keep in mind that if the extension does anything at all that could
> possibly throw an error, and if that error condition persists across
> multiple tries, you will have broken the database completely: it will
> be impossible to complete a checkpoint, and your WAL segment pool will
> grow until it exhausts disk.  So the idea of doing something that involves
> unspecified extension behavior, especially possible interaction with
> an external backup agent, right there is pretty terrifying.
I think that API for extensions should tend to protect developer from breaking everything, but may allow it with
precautionwarnings in docs and comments. Please let me know if this assumption is incorrect. 

>
> Other problems with the proposed patch: it misses coverage of
> BgBufferSync, and I don't like exposing an ad-hoc structure like
> CkptTsStatus as part of an extension API.  The algorithm used by
> BufferSync to schedule buffer writes has changed multiple times
> before and doubtless will again; if we're going to have a hook
> here it should depend as little as possible on those details.
OK, now I see that «buf_internals.h» had word internals for a reason. Thanks for pointing that out, I didn’t knew about
changesin these algorithms. 

Best regards, Andrey Borodin, Yandex.




Re: [HACKERS] Adding hook in BufferSync for backup purposes

From
Tom Lane
Date:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> 7 авг. 2017 г., в 18:37, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>> Yeah.  Keep in mind that if the extension does anything at all that could
>> possibly throw an error, and if that error condition persists across
>> multiple tries, you will have broken the database completely: it will
>> be impossible to complete a checkpoint, and your WAL segment pool will
>> grow until it exhausts disk.  So the idea of doing something that involves
>> unspecified extension behavior, especially possible interaction with
>> an external backup agent, right there is pretty terrifying.

> I think that API for extensions should tend to protect developer from
> breaking everything, but may allow it with precaution warnings in docs
> and comments. Please let me know if this assumption is incorrect.

My point is not to claim that we mustn't put a hook there.  It's that what
such a hook could safely do is tightly constrained, and you've not offered
clear evidence that there's something useful to be done within those
constraints.  Alvaro seems to think that the result might be better
reached by hooking in at other places, and my gut reaction is similar.
        regards, tom lane



Re: [HACKERS] Adding hook in BufferSync for backup purposes

From
Andrey Borodin
Date:
Hi hackers!

> 8 авг. 2017 г., в 11:27, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> My point is not to claim that we mustn't put a hook there.  It's that what
> such a hook could safely do is tightly constrained, and you've not offered
> clear evidence that there's something useful to be done within those
> constraints.  Alvaro seems to think that the result might be better
> reached by hooking in at other places, and my gut reaction is similar.
>


I was studying through work done by Marco and Gabriel on the matter of tracking pages for the incremental backup, and I
havefound PTRACK patch by Yury Zhuravlev and PostgresPro [0]. This work seems to be solid and thoughtful. I have
drafteda new prototype for hooks enabling incremental backup as extension based on PTRACK calls. 

If you look at the original patch you can see that attached patch differs slightly: functionality to track end of
criticalsection is omitted. I have not included it in this draft because it changes very sensitive place even for those
whodo not need it. 

Subscriber of proposed hook must remember that it is invoked under critical section. But there cannot me more than
XLR_MAX_BLOCK_IDblocks for one transaction. Proposed draft does not add any functionality to track finished
transactionsor any atomic unit of work, just provides a flow of changed block numbers. Neither does this draft provide
anyassumption on where to store this information. I suppose subscribers could trigger asynchronous writes somewhere as
longas info for given segment is accumulated (do we need a hook on segment end then?). During inremental backup you can
skipscanning those WAL segments for which you have accumulated changeset of block numbers. 

The thing which is not clear to me: if we are accumulating blocknumbers under critical section, then we must place them
topreallocated array. When is the best time to take away these blocknumbers to empty that array and avoid overflow?
PTRACKhas array of XLR_MAX_BLOCK_ID length and saves these array during the end of each critical section. But I want to
avoidintervention into critical sections. 

Thank you for your attention, any thoughts will be appreciated.

Best regards, Andrey Borodin.

[0] https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment