On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done > while the file is being transferred to the frontend-end. The manifest file itself is copied at the > very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in > order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP > is called. > > That means, backup manifest should support its creation while excluding the checksum during START_BACKUP(). > I also need the directory information as well for two reasons: > > - In plain format, base path has to exist before we can write the file. we can extract the base path from the file > but doing that for all files does not seem a good idea. > - base backup does not include the content of some directories but those directories although empty, are still > expected in PGDATA. > > I can make these changes part of parallel backup (which would be on top of backup manifest patch) or > these changes can be done as part of manifest patch and then parallel can use them. > > Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I initially thought that would be a good idea, but after further thought it seems like it just complicates the code to no real benefit. I suggest that the START_BACKUP command just return a result set, like a query, with perhaps four columns: file name, file type ('d' for directory or 'f' for file), file size, file mtime. pg_basebackup will ignore the mtime, but some other tools might find that useful information.
I wonder if we should also split START_BACKUP (which should enter non-exclusive backup mode) from GET_FILE_LIST, in case some other client program wants to use one of those but not the other. I think that's probably a good idea, but not sure.
I still think that the files should be requested one at a time, not a huge long list in a single command.
What about have an API to get the single file or list of files? We will use a single file in
our application and other tools can get the benefit of list of files.