Re: block-level incremental backup - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: block-level incremental backup
Date
Msg-id 1148d018-ff98-3857-20b8-45179c0742a3@postgrespro.ru
Whole thread Raw
In response to block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: block-level incremental backup
List pgsql-hackers

On 09.04.2019 18:48, Robert Haas wrote:
> 1. There should be a way to tell pg_basebackup to request from the
> server only those blocks where LSN >= threshold_value.

Some times ago I have implemented alternative version of ptrack utility 
(not one used in pg_probackup)
which detects updated block at file level. It is very simple and may be 
it can be sometimes integrated in master.
I attached patch to vanilla to this mail.
Right now it contains just two GUCs:

ptrack_map_size: Size of ptrack map (number of elements) used for 
incremental backup: 0 disabled.
ptrack_block_log: Logarithm of ptrack block size (amount of pages)

and one function:

pg_ptrack_get_changeset(startlsn pg_lsn) returns 
{relid,relfilenode,reltablespace,forknum,blocknum,segsize,updlsn,path}

Idea is very simple: it creates hash map of fixed size (ptrack_map_size) 
and stores LSN of written pages in this map.
As far as postgres default page size seems to be too small  for ptrack 
block (requiring too large hash map or increasing number of conflicts, 
as well as
increasing number of random reads) it is possible to configure ptrack 
block to consists of multiple pages (power of 2).

This patch is using memory mapping mechanism. Unfortunately there is no 
portable wrapper for it in Postgres, so I have to provide own 
implementations for Unix/Windows. Certainly it is not good and should be 
rewritten.

How to use?

1. Define ptrack_map_size in postgres.conf, for example (use simple 
number for more uniform hashing):

ptrack_map_size = 1000003

2.  Remember current lsn.

psql postgres -c "select pg_current_wal_lsn()"
  pg_current_wal_lsn
--------------------
  0/224A268
(1 row)

3. Do some updates.

$ pgbench -T 10 postgres

4. Select changed blocks.

  select * from pg_ptrack_get_changeset('0/224A268');
  relid | relfilenode | reltablespace | forknum | blocknum | segsize |  
updlsn   |         path
-------+-------------+---------------+---------+----------+---------+-----------+----------------------
  16390 |       16396 |          1663 |       0 |     1640 |       1 | 
0/224FD88 | base/12710/16396
  16390 |       16396 |          1663 |       0 |     1641 |       1 | 
0/2258680 | base/12710/16396
  16390 |       16396 |          1663 |       0 |     1642 |       1 | 
0/22615A0 | base/12710/16396
...

Certainly ptrack should be used as part of some backup tool (as 
pg_basebackup or pg_probackup).


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_dump is broken for partition tablespaces
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: block-level incremental backup