Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: WIP/PoC for parallel backup
Date
Msg-id CAM2+6=UYH799S2qZ4V=f7EK1Y1HzFQKx8GnQWw2gEET1J=mZhQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
Responses Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Re: WIP/PoC for parallel backup  (Asif Rehman <asifr.rehman@gmail.com>)
List pgsql-hackers


On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).

Review comments on these patches:

1.
+    XLogRecPtr    wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+    int32        size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+        else if (
+#ifndef WIN32
+                 S_ISLNK(statbuf.st_mode)
+#else
+                 pgwin32_is_junction(pathbuf)
+#endif
+            )
+        {
+            /*
+             * If symlink, write it as a directory. file symlinks only allowed
+             * in pg_tblspc
+             */
+            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+        }

In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?

5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+        /*
+         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+         * 'base/1/1245/32683', ...) [options]
+         */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+            SCONST                            { $$ = (Node *) makeString($1); }
+            ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories


=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?

When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [patch]socket_timeout in interfaces/libpq
Next
From: Amit Khandekar
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files