Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Kashif Zeeshan
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CAKfXphpurkUohwB=cHoBkKC-axVWx+38vj-tDM8MbutOJH-8uQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Re: WIP/PoC for parallel backup  (Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>)
List pgsql-hackers
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact               postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal      postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone


On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:


On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from 

basebackup.c:sendDir()


/*

 * Check if the postmaster has signaled us to exit, and abort with an

 * error in that case. The error handler further up will call

 * do_pg_abort_backup() for us. Also check that if the backup was

 * started while still in recovery, the server wasn't promoted.

 * do_pg_stop_backup() will check that too, but it's better to stop

 * the backup early than continue to the end and fail there.

 */

CHECK_FOR_INTERRUPTS();

if (RecoveryInProgress() != backup_started_in_recovery)

ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

errmsg("the standby was promoted during online backup"),

errhint("This means that the backup being taken is corrupt "

"and should not be used. "

"Try taking another online backup.")));



> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.


Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct

{

char backupid[NAMEDATALEN];

XLogRecPtr startptr;


slock_t lock;

int64 throttling_counter;

bool backup_started_in_recovery;

} BackupSharedState;



The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ 
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery 
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically. 

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as 
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.
 
 
Grammar:
There is a change in the resultset being returned for START_BACKUP command; 
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
 
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca



--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: User Interface for WAL usage data
Next
From: Michał Wadas
Date:
Subject: Proposal: is_castable