Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Asif Rehman
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CADM=JegOqxdzTVdCshtqtuCO7mQDk7enQoYbaeuAjJKoeEq=Og@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Re: WIP/PoC for parallel backup  (Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>)
List pgsql-hackers

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).


+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.
Done.
 

+list_wal_files_opt_list:
+                       SCONST SCONST
                                {
-                                 $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+                                       $$ = list_make2(
+                                       makeDefElem("start_wal_location",
+                                               (Node *)makeString($2), -1),
+                                       makeDefElem("end_wal_location",
+                                               (Node *)makeString($2), -1));
+
                                }

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

The startptr and endptr are now in a shared state. so this command does not
need to have these two options now. So I have removed this rule entirely.


@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
                set_ps_display(activitymsg);
        }

-       perform_base_backup(&opt);
+       switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

Okay. I have renamed the function and have updated the comments.
 

-       return (manifest->buffile != NULL);
+       return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

Okay, I forgot to remove this check. In the backup manifest patch, manifest_info
object is always available. Anyways I have removed this check for 003 patch
as well.
 

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
  */
 static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+                                        StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

Non-parallal backup returns startptr and tli in the result set. The START_BACKUP
returns startptr, tli, backup label and backupid. So I had extended this result set. 

I have removed the changes from SendXlogRecPtrResult and have added another
function just for returning the result set for parallel backup.
 

+       if (strlen(opt->tablespace) > 0)
+               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+       else
+               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+       SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

I don't foresee memory to be a challenge here. Assuming a database containing 10240
relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB
of space in memory. This obviously can be reduced, but it doesn’t seem too bad either.
One way of doing it is by fetching a smaller set of files and clients can result in the next
set if the current one is processed; perhaps fetch initially per table space and request for
next one once the current one is done with.

Currently, basebackup only does compression on the client-side. So, I suggest we stick with
the existing behavior. On the other thread, you have mentioned that the backend should send
the tarballs and that the server should decide which files per tarball. I believe the current
design can accommodate that easily if it's the client deciding the files per tarball. The current
design can also accommodate server-side compression and encryption with minimal changes. 
Is there a point I’m overlooking here?

 

+       if (state->parallel_mode)
+               SpinLockAcquire(&state->lock);
+
+       state->throttling_counter += increment;
+
+       if (state->parallel_mode)
+               SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

Okay have added throttling_counter as atomic. however a lock is still required
for  throttling_counter%=throttling_sample.


 


+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+       if (file == NULL)
+               return;

That seems totally inappropriate.

Removed.


+                       sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

sendFile() always sends files with tar header included, even if the backup mode

is plain. pg_basebackup also expects the same. That's the current behavior of

the system.

Otherwise, we will have to duplicate this function which would be doing the pretty

much same thing, except the tar header.

 


+               else
+                       ereport(WARNING,
+                                       (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

Removed the warning and generated an error if other then a regular file is requested.

 

+       /*
+        * Check for checksum failures. If there are failures across multiple
+        * processes it may not report total checksum count, but it will error
+        * out,terminating the backup.
+        */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

Added an atomic uint64 total_checksum_failures to shared state to keep
the total count across workers, So it will have the same behavior as current.
 

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: HEAPDEBUGALL is broken
Next
From: Tom Lane
Date:
Subject: Re: HEAPDEBUGALL is broken