Re: block-level incremental backup - Mailing list pgsql-hackers

From Jeevan Ladhe
Subject Re: block-level incremental backup
Date
Msg-id CAOgcT0OE9SBBZ63p3jt0EbcbMHJ8T-qDFUL-g7AZrujwe4QVqA@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers

Hi Jeevan


I reviewed first two patches -


0001-Add-support-for-command-line-option-to-pass-LSN.patch and

0002-Add-TAP-test-to-test-LSN-option.patch


from the set of incremental backup patches, and the changes look good to me.


I had some concerns around the way we are working around with the fact that

pg_lsn_in() accepts the lsn with 0 as a valid lsn and I think that itself is

contradictory to the definition of InvalidXLogRecPtr. I have started a separate

new thread[1] for the same.


Also, I observe that now commit 21f428eb, has already moved the lsn decoding

logic to a separate function pg_lsn_in_internal(), so the function

decode_lsn_internal() from patch 0001 will go away and the dependent code needs

to be modified.


I shall review the rest of the patches, and post the comments.


Regards,

Jeevan Ladhe


[1] https://www.postgresql.org/message-id/CAOgcT0NOM9oR0Hag_3VpyW0uF3iCU=BDUFSPfk9JrWXRcWQHqw@mail.gmail.com


On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Anastasia,

On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:
23.04.2019 14:08, Anastasia Lubennikova wrote:
> I'm volunteering to write a draft patch or, more likely, set of
> patches, which
> will allow us to discuss the subject in more detail.
> And to do that I wish we agree on the API and data format (at least
> broadly).
> Looking forward to hearing your thoughts.

Though the previous discussion stalled,
I still hope that we could agree on basic points such as a map file
format and protocol extension,
which is necessary to start implementing the feature.

It's great that you too come up with the PoC patch. I didn't look at your changes in much details but we at EnterpriseDB too working on this feature and started implementing it.

Attached series of patches I had so far... (which needed further optimization and adjustments though)

Here is the overall design (as proposed by Robert) we are trying to implement:

1. Extend the BASE_BACKUP command that can be used with replication connections. Add a new [ LSN 'lsn' ] option.

2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send the option added to the server in #1.

Here are the implementation details when we have a valid LSN

sendFile() in basebackup.c is the function which mostly does the thing for us. If the filename looks like a relation file, then we'll need to consider sending only a partial file. The way to do that is probably:

A. Read the whole file into memory.

B. Check the LSN of each block. Build a bitmap indicating which blocks have an LSN greater than or equal to the threshold LSN.

C. If more than 90% of the bits in the bitmap are set, send the whole file just as if this were a full backup. This 90% is a constant now; we might make it a GUC later.

D. Otherwise, send a file with .partial added to the name. The .partial file contains an indication of which blocks were changed at the beginning, followed by the data blocks. It also includes a checksum/CRC.
Currently, a .partial file format looks like:
 - start with a 4-byte magic number
 - then store a 4-byte CRC covering the header
 - then a 4-byte count of the number of blocks included in the file
 - then the block numbers, each as a 4-byte quantity
 - then the data blocks


We are also working on combining these incremental back-ups with the full backup and for that, we are planning to add a new utility called pg_combinebackup. Will post the details on that later once we have on the same page for taking backup.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Built-in connection pooler
Next
From: Peter Geoghegan
Date:
Subject: Re: should there be a hard-limit on the number of transactionspending undo?