Thread: [HACKERS] [patch] reorder tablespaces in basebackup tar stream forbackup_label

[HACKERS] [patch] reorder tablespaces in basebackup tar stream forbackup_label

From
Michael Banck
Date:
Hi,

currently, the backup_label and (I think) the tablespace_map files are
(by design) conveniently located at the beginning of the main tablespace
tarball when making a basebackup. However, (by accident or also by
design?) the main tablespace is the last tarball[1] to be streamed via
the BASE_BACKUP replication protocol command. 

For pg_basebackup, this is not a real problem, as either each tablespace
is its own tarfile (in tar format mode), or the files are extracted
anyway (in plain mode).

However, third party tools using the BASE_BACKUP command might want to
extract the backup_label, e.g. in order to figure out the START WAL
LOCATION. If they make a big tarball for the whole cluster potentially
including all external tablespaces, then the backup_label file is
somewhere in the middle of it and it takes a long time for tar to
extract it.

So I am proposing the attached patch, which sends the base tablespace
first, and then all the other external tablespaces afterwards, thus
having base_backup be the first file in the tar in all cases. Does
anybody see a problem with that?


Michael

[1] Chapter 52.3 of the documentation says "one or more CopyResponse
results will be sent, one for the main data directory and one for each
additional tablespace other than pg_default and pg_global.", which makes
it sound like the main data directory is first, but in my testing, this
is not the case.

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
Am Dienstag, den 21.02.2017, 11:17 +0100 schrieb Michael Banck:
> However, third party tools using the BASE_BACKUP command might want
> to
> extract the backup_label, e.g. in order to figure out the START WAL
> LOCATION. If they make a big tarball for the whole cluster
> potentially
> including all external tablespaces, then the backup_label file is
> somewhere in the middle of it and it takes a long time for tar to
> extract it.
> 
> So I am proposing the attached patch, which sends the base tablespace
> first, and then all the other external tablespaces afterwards, thus
> having base_backup be the first file in the tar in all cases. Does
> anybody see a problem with that?
> 
> 
> Michael
> 
> [1] Chapter 52.3 of the documentation says "one or more CopyResponse
> results will be sent, one for the main data directory and one for
> each
> additional tablespace other than pg_default and pg_global.", which
> makes
> it sound like the main data directory is first, but in my testing,
> this
> is not the case.

The comment in the code says explicitely to add the base directory to
the end of the list, not sure if that is out of a certain reason.

I'd say this is an oversight in the implementation. I'm currently
working on a tool using the streaming protocol directly and i've
understood it exactly the way, that the default tablespace is the first
one in the stream.

So +1 for the patch.



On Tue, Feb 21, 2017 at 3:47 PM, Michael Banck
<michael.banck@credativ.de> wrote:
> currently, the backup_label and (I think) the tablespace_map files are
> (by design) conveniently located at the beginning of the main tablespace
> tarball when making a basebackup. However, (by accident or also by
> design?) the main tablespace is the last tarball[1] to be streamed via
> the BASE_BACKUP replication protocol command.
>
> For pg_basebackup, this is not a real problem, as either each tablespace
> is its own tarfile (in tar format mode), or the files are extracted
> anyway (in plain mode).
>
> However, third party tools using the BASE_BACKUP command might want to
> extract the backup_label, e.g. in order to figure out the START WAL
> LOCATION. If they make a big tarball for the whole cluster potentially
> including all external tablespaces, then the backup_label file is
> somewhere in the middle of it and it takes a long time for tar to
> extract it.
>
> So I am proposing the attached patch, which sends the base tablespace
> first, and then all the other external tablespaces afterwards, thus
> having base_backup be the first file in the tar in all cases. Does
> anybody see a problem with that?

Please add this to commitfest.postgresql.org so it doesn't get forgotten.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Michael Banck
Date:
Hi,

Am Sonntag, den 26.02.2017, 22:25 +0530 schrieb Robert Haas:
> On Tue, Feb 21, 2017 at 3:47 PM, Michael Banck
> <michael.banck@credativ.de> wrote:
> > So I am proposing the attached patch, which sends the base tablespace
> > first, and then all the other external tablespaces afterwards, thus
> > having base_backup be the first file in the tar in all cases. Does
> > anybody see a problem with that?
> 
> Please add this to commitfest.postgresql.org so it doesn't get forgotten.

Right, I was going to do that and have done so now, thanks for the
reminder.

Also, attached is a slightly changed patch which expands on the reason
of the change in the comment.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, Feb 22, 2017 at 9:23 PM, Bernd Helmle <mailings@oopsware.de> wrote:
> The comment in the code says explicitely to add the base directory to
> the end of the list, not sure if that is out of a certain reason.
>
> I'd say this is an oversight in the implementation. I'm currently
> working on a tool using the streaming protocol directly and i've
> understood it exactly the way, that the default tablespace is the first
> one in the stream.
>
> So +1 for the patch.

Commit 507069de has switched the main directory from the beginning to
the end of the list, and the thread about this commit is here:
https://www.postgresql.org/message-id/AANLkTikgmZRkBuQ%2B_hcwPBv7Cd7xW48Ev%3DUBHA-k4v0W%40mail.gmail.com

+       /* Add a node for the base directory at the beginning.  This way, the
+        * backup_label file is always the first file to be sent. */       ti = palloc0(sizeof(tablespaceinfo));
ti->size= opt->progress ? sendDir(".", 1, true, tablespaces,
 
true) : -1;
-       tablespaces = lappend(tablespaces, ti);
+       tablespaces = lcons(ti, tablespaces);
So, the main directory is located at the end on purpose. When using
--wal-method=fetch the WAL segments are part of the main tarball, so
if you send the main tarball first you would need to generate a second
tarball with the WAL segments that have been generated between the
moment the main tarball has finished until the end of the last
tablespace taken if you want to have a consistent backup. Your patch
would work with the stream mode though.
-- 
Michael



Am Freitag, den 03.03.2017, 15:44 +0900 schrieb Michael Paquier:
> So, the main directory is located at the end on purpose. When using
> --wal-method=fetch the WAL segments are part of the main tarball, so
> if you send the main tarball first you would need to generate a
> second
> tarball with the WAL segments that have been generated between the
> moment the main tarball has finished until the end of the last
> tablespace taken if you want to have a consistent backup. Your patch
> would work with the stream mode though.

Ah right, i assumed there must be something, otherwise the comment
won't be there ;)

We could special case that part to distinguish fetch/stream mode, but i
fear that leads to more confusion than it wants to fix. The other
option of a separate tar file looks awkward from a usability point of
view.



On Sat, Mar 4, 2017 at 1:13 AM, Bernd Helmle <mailings@oopsware.de> wrote:
> Ah right, i assumed there must be something, otherwise the comment
> won't be there ;)
>
> We could special case that part to distinguish fetch/stream mode, but i
> fear that leads to more confusion than it wants to fix. The other
> option of a separate tar file looks awkward from a usability point of
> view.

It is possible as well to group the WAL files in a different tarball
than the main directory and put that at the tail of the list for the
fetch mode, while the main directory gets at the head. That would
break things for existing users by the way, and just being able to
look at the LSN start location before receiving the tar bytes of the
backup does not seem enough to justify such a breakage.
-- 
Michael



On 3/4/17 2:20 AM, Michael Paquier wrote:
> On Sat, Mar 4, 2017 at 1:13 AM, Bernd Helmle <mailings@oopsware.de> wrote:
>> Ah right, i assumed there must be something, otherwise the comment
>> won't be there ;)
>>
>> We could special case that part to distinguish fetch/stream mode, but i
>> fear that leads to more confusion than it wants to fix. The other
>> option of a separate tar file looks awkward from a usability point of
>> view.
> 
> It is possible as well to group the WAL files in a different tarball
> than the main directory and put that at the tail of the list for the
> fetch mode, while the main directory gets at the head. That would
> break things for existing users by the way, and just being able to
> look at the LSN start location before receiving the tar bytes of the
> backup does not seem enough to justify such a breakage.

This thread is stalled and it looks like the patch may not be workable,
at least in the current form.

I will mark this a "Returned with Feedback" on 2017-03-17 unless there
are arguments to the contrary.

Thanks,
-- 
-David
david@pgmasters.net



On Tue, Mar 14, 2017 at 11:34 PM, David Steele <david@pgmasters.net> wrote:
> This thread is stalled and it looks like the patch may not be workable,
> at least in the current form.
>
> I will mark this a "Returned with Feedback" on 2017-03-17 unless there
> are arguments to the contrary.

Or even rejected would be fine. I am also not sure if changing the
number of tarballs of the fetch mode is worth breaking to get a look
earlier at the backup_label file, still the patch as it stands is no
good.
-- 
Michael



Hi,

sorry, it took me a while to get back to this.

Am Freitag, den 03.03.2017, 15:44 +0900 schrieb Michael Paquier:
> On Wed, Feb 22, 2017 at 9:23 PM, Bernd Helmle <mailings@oopsware.de> wrote:
> > The comment in the code says explicitely to add the base directory to
> > the end of the list, not sure if that is out of a certain reason.
> >
> > I'd say this is an oversight in the implementation. I'm currently
> > working on a tool using the streaming protocol directly and i've
> > understood it exactly the way, that the default tablespace is the first
> > one in the stream.
> >
> > So +1 for the patch.
> 
> Commit 507069de has switched the main directory from the beginning to
> the end of the list, and the thread about this commit is here:
> https://www.postgresql.org/message-id/AANLkTikgmZRkBuQ%2B_hcwPBv7Cd7xW48Ev%3DUBHA-k4v0W%40mail.gmail.com
> 
> +       /* Add a node for the base directory at the beginning.  This way, the
> +        * backup_label file is always the first file to be sent. */
>         ti = palloc0(sizeof(tablespaceinfo));
>         ti->size = opt->progress ? sendDir(".", 1, true, tablespaces,
> true) : -1;
> -       tablespaces = lappend(tablespaces, ti);
> +       tablespaces = lcons(ti, tablespaces);
> So, the main directory is located at the end on purpose. When using
> --wal-method=fetch the WAL segments are part of the main tarball, so
> if you send the main tarball first you would need to generate a second
> tarball with the WAL segments that have been generated between the
> moment the main tarball has finished until the end of the last
> tablespace taken if you want to have a consistent backup. 

Ah, thanks for pointing that out, I've missed that in my testing.

> Your patch would work with the stream mode though.

Or, if not requesting the "WAL" option of the replication protocol's
BASE_BACKUP command.

I agree it doesn't make sense to start messing with fetch mode, but I
don't think we guarantee any ordering of tablespaces (to wit, Bernd was
pretty sure it was the other way around all the time), neither do I
think having the main tablespace be the first for non-WAL/stream and the
last for WAL/fetch would be confusing personally, though I understand
this is debatable.

So I've updated the patch to only switch the main tablespace to be first
in case WAL isn't included, please find it attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Fri, Mar 17, 2017 at 3:38 AM, Michael Banck
<michael.banck@credativ.de> wrote:
>> Your patch would work with the stream mode though.
>
> Or, if not requesting the "WAL" option of the replication protocol's
> BASE_BACKUP command.
>
> I agree it doesn't make sense to start messing with fetch mode, but I
> don't think we guarantee any ordering of tablespaces (to wit, Bernd was
> pretty sure it was the other way around all the time), neither do I
> think having the main tablespace be the first for non-WAL/stream and the
> last for WAL/fetch would be confusing personally, though I understand
> this is debatable.

From the docs:
https://www.postgresql.org/docs/9.6/static/protocol-replication.html
"After the second regular result set, one or more CopyResponse results
will be sent, one for the main data directory and one for each
additional tablespace other than pg_default and pg_global."
So yes there is no guarantee about the order tablespaces are sent.

> So I've updated the patch to only switch the main tablespace to be first
> in case WAL isn't included, please find it attached.

-       /* Add a node for the base directory at the end */
+       /* Add a node for the base directory, either at the end or (if
+        * WAL is not included) at the beginning (if WAL is not
+        * included).  This means the backup_label file is the first
+        * file to be sent in the latter case. */       ti = palloc0(sizeof(tablespaceinfo));       ti->size =
opt->progress? sendDir(".", 1, true, tablespaces,
 
true) : -1;
-       tablespaces = lappend(tablespaces, ti);
+       if (opt->includewal)
+           tablespaces = lappend(tablespaces, ti);
+       else
+           tablespaces = lcons(ti, tablespaces);
The comment block format is incorrect. I would think as well that this
comment should say it is important to have the main tablespace listed
last it includes the WAL segments, and those need to contain all the
latest WAL segments for a consistent backup.

FWIW, I have no issue with changing the ordering of backups the way
you are proposing here as long as the comment of this code path is
clear.
-- 
Michael



Hi,

Am Freitag, den 17.03.2017, 10:50 +0900 schrieb Michael Paquier:
> The comment block format is incorrect. I would think as well that this
> comment should say it is important to have the main tablespace listed
> last it includes the WAL segments, and those need to contain all the
> latest WAL segments for a consistent backup.

How about the attached? The comment now reads as follows:

|Add a node for the base directory. If WAL is included, the base
|directory has to be last as the WAL files get appended to it. If WAL
|is not included, send the base directory first, so that the
|backup_label file is the first file to be sent.

> FWIW, I have no issue with changing the ordering of backups the way
> you are proposing here as long as the comment of this code path is
> clear.

OK, great, let's see what the committers think then.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Fri, Mar 17, 2017 at 7:18 PM, Michael Banck
<michael.banck@credativ.de> wrote:
> Hi,
>
> Am Freitag, den 17.03.2017, 10:50 +0900 schrieb Michael Paquier:
>> The comment block format is incorrect. I would think as well that this
>> comment should say it is important to have the main tablespace listed
>> last it includes the WAL segments, and those need to contain all the
>> latest WAL segments for a consistent backup.
>
> How about the attached? The comment now reads as follows:
>
> |Add a node for the base directory. If WAL is included, the base
> |directory has to be last as the WAL files get appended to it. If WAL
> |is not included, send the base directory first, so that the
> |backup_label file is the first file to be sent.

Close enough, still not that:
https://www.postgresql.org/docs/current/static/source-format.html

>> FWIW, I have no issue with changing the ordering of backups the way
>> you are proposing here as long as the comment of this code path is
>> clear.
>
> OK, great, let's see what the committers think then.

Still that's a minor point, so I am making that ready for committer.
-- 
Michael



Hi,

Am Freitag, den 17.03.2017, 20:46 +0900 schrieb Michael Paquier:
> On Fri, Mar 17, 2017 at 7:18 PM, Michael Banck
> <michael.banck@credativ.de> wrote:
> > Am Freitag, den 17.03.2017, 10:50 +0900 schrieb Michael Paquier:
> >> The comment block format is incorrect. I would think as well that this
> >> comment should say it is important to have the main tablespace listed
> >> last it includes the WAL segments, and those need to contain all the
> >> latest WAL segments for a consistent backup.
> >
> > How about the attached? The comment now reads as follows:
> >
> > |Add a node for the base directory. If WAL is included, the base
> > |directory has to be last as the WAL files get appended to it. If WAL
> > |is not included, send the base directory first, so that the
> > |backup_label file is the first file to be sent.
> 
> Close enough, still not that:
> https://www.postgresql.org/docs/current/static/source-format.html

Ouch, three times a charm then I guess. I hope the attached is finally
correct, thanks for bearing with me.

> >> FWIW, I have no issue with changing the ordering of backups the way
> >> you are proposing here as long as the comment of this code path is
> >> clear.
> >
> > OK, great, let's see what the committers think then.
> 
> Still that's a minor point, so I am making that ready for committer.

Thanks!


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Fujii Masao
Date:
On Tue, Feb 21, 2017 at 7:17 PM, Michael Banck
<michael.banck@credativ.de> wrote:
> Hi,
>
> currently, the backup_label and (I think) the tablespace_map files are
> (by design) conveniently located at the beginning of the main tablespace
> tarball when making a basebackup. However, (by accident or also by
> design?) the main tablespace is the last tarball[1] to be streamed via
> the BASE_BACKUP replication protocol command.
>
> For pg_basebackup, this is not a real problem, as either each tablespace
> is its own tarfile (in tar format mode), or the files are extracted
> anyway (in plain mode).
>
> However, third party tools using the BASE_BACKUP command might want to
> extract the backup_label, e.g. in order to figure out the START WAL
> LOCATION.

The first result set that BASE BACKUP sends back to a client contains
the START WAL LOCATION. So at first, ISTM that you don't need backup_label
for that purpose.

If your need other information except START WAL LOCATION at the beginning of
base backup and they are very useful for many third-party softwares,
you can add them into that first result set. If you do this, you can
retrieve them
at the beginning even when WAL files are included in the backup.

Regards,

-- 
Fujii Masao



Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Michael Paquier
Date:
On Wed, Mar 29, 2017 at 3:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> If your need other information except START WAL LOCATION at the beginning of
> base backup and they are very useful for many third-party softwares,
> you can add them into that first result set. If you do this, you can
> retrieve them
> at the beginning even when WAL files are included in the backup.

You mean in the result tuple of pg_start_backup(), right? Why not.
-- 
Michael



Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Michael Banck
Date:
Hi,

Am Mittwoch, den 29.03.2017, 15:22 +0900 schrieb Michael Paquier:
> On Wed, Mar 29, 2017 at 3:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > If your need other information except START WAL LOCATION at the beginning of
> > base backup and they are very useful for many third-party softwares,
> > you can add them into that first result set. If you do this, you can
> > retrieve them
> > at the beginning even when WAL files are included in the backup.
> 
> You mean in the result tuple of pg_start_backup(), right? Why not.

The replication protocol chapter says: "When the backup is started, the
server will first send two ordinary result sets, followed by one or more
CopyResponse results. The first ordinary result set contains the
starting position of the backup, in a single row with two columns."

However, I don't think it is very obvious to users (or at least it is
not to me) how to get at this from psql, if you want to script it.  If I
run something like 'psql "dbname=postgres replication=database" -c
"BASE_BACKUP" "> foo', I seem to only get the tar file(s), unless I am
missing something. 

The reason one might not want to run pg_basebackup directly is that this
way one can pipe the output to an external program, e.g. to better
compress it; this is not possible with pg_basebackup if you additional
tablespaces.

So I think that has some merit on its own.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Kyotaro HORIGUCHI
Date:
Hello,

At Wed, 29 Mar 2017 09:23:42 +0200, Michael Banck <michael.banck@credativ.de> wrote in
<1490772222.18436.14.camel@credativ.de>
> Hi,
> 
> Am Mittwoch, den 29.03.2017, 15:22 +0900 schrieb Michael Paquier:
> > On Wed, Mar 29, 2017 at 3:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > > If your need other information except START WAL LOCATION at the beginning of
> > > base backup and they are very useful for many third-party softwares,
> > > you can add them into that first result set. If you do this, you can
> > > retrieve them
> > > at the beginning even when WAL files are included in the backup.
> > 
> > You mean in the result tuple of pg_start_backup(), right? Why not.
> 
> The replication protocol chapter says: "When the backup is started, the
> server will first send two ordinary result sets, followed by one or more
> CopyResponse results. The first ordinary result set contains the
> starting position of the backup, in a single row with two columns."
> 
> However, I don't think it is very obvious to users (or at least it is
> not to me) how to get at this from psql, if you want to script it.  If I
> run something like 'psql "dbname=postgres replication=database" -c
> "BASE_BACKUP" "> foo', I seem to only get the tar file(s), unless I am
> missing something. 

Interesting. The protocol documentation seems fine to me but
maybe the example is perplexing. psql always prints only the last
result set for a query. So your query resembling the example in
the page gives only unrecognizable dump.

A little modification to psql prints the follwing but anyway
modifying psql to handle BASE_BACKUP like this doesn't seem
reasonable.

$ psql "dbname=postgres replication=database port=5433" -c "BASE_BACKUP"  | head recptr   | tli 
-----------+-----0/C000028 |   1
(1 row)
spcoid | spclocation | size 
--------+-------------+------       |             |     
(1 row)

backup_labelgres0000000E)
CHECKPOINT LOCATION: 0/E000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-03-29 17:32:16 JST
LABEL: base backup
....

> The reason one might not want to run pg_basebackup directly is that this
> way one can pipe the output to an external program, e.g. to better
> compress it; this is not possible with pg_basebackup if you additional
> tablespaces.
> 
> So I think that has some merit on its own.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Fujii Masao
Date:
On Wed, Mar 29, 2017 at 5:36 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Wed, 29 Mar 2017 09:23:42 +0200, Michael Banck <michael.banck@credativ.de> wrote in
<1490772222.18436.14.camel@credativ.de>
>> Hi,
>>
>> Am Mittwoch, den 29.03.2017, 15:22 +0900 schrieb Michael Paquier:
>> > On Wed, Mar 29, 2017 at 3:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > > If your need other information except START WAL LOCATION at the beginning of
>> > > base backup and they are very useful for many third-party softwares,
>> > > you can add them into that first result set. If you do this, you can
>> > > retrieve them
>> > > at the beginning even when WAL files are included in the backup.
>> >
>> > You mean in the result tuple of pg_start_backup(), right? Why not.
>>
>> The replication protocol chapter says: "When the backup is started, the
>> server will first send two ordinary result sets, followed by one or more
>> CopyResponse results. The first ordinary result set contains the
>> starting position of the backup, in a single row with two columns."
>>
>> However, I don't think it is very obvious to users (or at least it is
>> not to me) how to get at this from psql, if you want to script it.  If I

I don't think that using psql to run BASE_BACKUP command is good
approach. Instead, IMO you basically should implement the client program
which can handle BASE_BACKUP properly, or extend pg_basebackup
so that you can do what you want to do.

Regards,

-- 
Fujii Masao



Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Michael Paquier
Date:
On Fri, Mar 31, 2017 at 12:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I don't think that using psql to run BASE_BACKUP command is good
> approach.

That's basically what pg_basebackup is made for, and it makes sure
that some sanity checks and measures happen.

> Instead, IMO you basically should implement the client program
> which can handle BASE_BACKUP properly, or extend pg_basebackup
> so that you can do what you want to do.

In my first reviews of the patch, I completely forgot the fact that
BASE_BACKUP does send the start LSN of the backup in the first result
set, so the patch proposed is actually rather useless because the data
you are looking for is already at hand. If more data would be
interesting to have, like the start timestamp number, we could just
extend the first result set a bit as Fujii-san is coming at. Let's
drop this patch and move on.
-- 
Michael



Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Kyotaro HORIGUCHI
Date:
At Fri, 31 Mar 2017 13:37:38 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQU1H=PAG3XBYFFg9w+emeQFYAdkep7cWVeJukXtB_m_Q@mail.gmail.com>
> In my first reviews of the patch, I completely forgot the fact that
> BASE_BACKUP does send the start LSN of the backup in the first result
> set, so the patch proposed is actually rather useless because the data
> you are looking for is already at hand. If more data would be
> interesting to have, like the start timestamp number, we could just
> extend the first result set a bit as Fujii-san is coming at. Let's
> drop this patch and move on.

+1 for dropping this.

But I think we should edit the documentation a bit.

I don't fully understand those who want to handle it by a script,
but the documentation seems to be suggesting that something like
is possible. So it might be better add a description like that or
just remove the example.

"psql doesn't handle this protocol properly. The instances of theusage of these protocols are found in the source code
ofwalreceiverand pg_basebackup."
 

That being said, pg_basebackup is straightforward but
unfortunately, walrecever.c seems a bit hard to read for those
who unaccustomed to PostgresSQL source code.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From
Michael Banck
Date:
On Fri, Mar 31, 2017 at 02:11:44PM +0900, Kyotaro HORIGUCHI wrote:
> At Fri, 31 Mar 2017 13:37:38 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQU1H=PAG3XBYFFg9w+emeQFYAdkep7cWVeJukXtB_m_Q@mail.gmail.com>
> > In my first reviews of the patch, I completely forgot the fact that
> > BASE_BACKUP does send the start LSN of the backup in the first result
> > set, so the patch proposed is actually rather useless because the data
> > you are looking for is already at hand. If more data would be
> > interesting to have, like the start timestamp number, we could just
> > extend the first result set a bit as Fujii-san is coming at. Let's
> > drop this patch and move on.
> 
> +1 for dropping this.

I've marked it as "Rejected" now, as that is clearly the consensus.
> But I think we should edit the documentation a bit.
> 
> I don't fully understand those who want to handle it by a script,
> but the documentation seems to be suggesting that something like
> is possible. So it might be better add a description like that or
> just remove the example.

The documentation says "For the purpose of testing replication
commands", one could use psql and "IDENTIFY_SYSTEM", it doesn't exactly
suggest to run BASE_BACKUP this way.

Which, by the way, appears to be working totally fine from psql, modulo
the problem that you won't get to the starting transaction location.

> "psql doesn't handle this protocol properly. The instances of the
>  usage of these protocols are found in the source code of
>  walreceiver and pg_basebackup."

Something along those lines makes sense I think, yeah.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer