Thread: pg_stat_progress_basebackup - progress reporting for pg_basebackup,in the server side
pg_stat_progress_basebackup - progress reporting for pg_basebackup,in the server side
From
Fujii Masao
Date:
Hi, pg_basebackup reports the backup progress if --progress option is specified, and we can monitor it in the client side. I think that it's useful if we can monitor the progress information also in the server side because, for example, we can easily check that by using SQL. So I'd like to propose pg_stat_progress_basebackup view that allows us to monitor the progress of pg_basebackup, in the server side. Thought? POC patch is attached. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Hi, > > pg_basebackup reports the backup progress if --progress option is > specified, > and we can monitor it in the client side. I think that it's useful if > we can > monitor the progress information also in the server side because, for > example, > we can easily check that by using SQL. So I'd like to propose > pg_stat_progress_basebackup view that allows us to monitor the > progress > of pg_basebackup, in the server side. Thought? > > POC patch is attached. | postgres=# \d pg_stat_progress_basebackup | View "pg_catalog.pg_stat_progress_basebackup" | Column | Type | Collation | Nullable | Default | ---------------------+---------+-----------+----------+--------- | pid | integer | | | | phase | text | | | | backup_total | bigint | | | | backup_streamed | bigint | | | | tablespace_total | bigint | | | | tablespace_streamed | bigint | | | I think the view needs client identity such like host/port pair besides pid (host/port pair fails identify client in the case of unix-sockets.). Also elapsed time from session start might be useful. pg_stat_progress_acuum has datid, datname and relid. + if (backup_total > 0 && backup_streamed > backup_total) + { + backup_total = backup_streamed; Do we need the condition "backup_total > 0"? + int tblspc_streamed = 0; + + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, + PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); This starts "streaming backup" phase with backup_total = 0. Coudln't we move to that phase after setting backup total and tablespace total? That is, just after calling SendBackupHeader(). + WHEN 3 THEN 'stopping backup'::text I'm not sure, but the "stop" seems suggesting the backup is terminated before completion. If it is following the name of the function pg_stop_backup, I think the name is suggesting to stop "the state for performing backup", not a backup. In the first place, the progress is about "backup" so it seems strange that we have another phase after the "stopping backup" phase. It might be better that it is "finishing file transfer" or such. "initializing" -> "starting file transfer" -> "transferring files" -> "finishing file transfer" -> "transaferring WAL" regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/01/30 12:58, Kyotaro Horiguchi wrote: > At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> Hi, >> >> pg_basebackup reports the backup progress if --progress option is >> specified, >> and we can monitor it in the client side. I think that it's useful if >> we can >> monitor the progress information also in the server side because, for >> example, >> we can easily check that by using SQL. So I'd like to propose >> pg_stat_progress_basebackup view that allows us to monitor the >> progress >> of pg_basebackup, in the server side. Thought? >> >> POC patch is attached. > > | postgres=# \d pg_stat_progress_basebackup > | View "pg_catalog.pg_stat_progress_basebackup" > | Column | Type | Collation | Nullable | Default > | ---------------------+---------+-----------+----------+--------- > | pid | integer | | | > | phase | text | | | > | backup_total | bigint | | | > | backup_streamed | bigint | | | > | tablespace_total | bigint | | | > | tablespace_streamed | bigint | | | > > I think the view needs client identity such like host/port pair > besides pid (host/port pair fails identify client in the case of > unix-sockets.). I don't think this is job of a progress reporting. If those information is necessary, we can join this view with pg_stat_replication using pid column as the join key. > Also elapsed time from session start might be > useful. +1 I think that not only pg_stat_progress_basebackup but also all the other progress views should report the time when the target command was started and the time when the phase was last changed. Those times would be useful to estimate the remaining execution time from the progress infromation. > pg_stat_progress_acuum has datid, datname and relid. > > + if (backup_total > 0 && backup_streamed > backup_total) > + { > + backup_total = backup_streamed; > > Do we need the condition "backup_total > 0"? I added the condition for the case where --progress option is not supplied in pg_basebackup, i.e., the case where the total amount of backup is not estimated at the beginning of the backup. In this case, total_backup is always 0. > + int tblspc_streamed = 0; > + > + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, > + PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); > > This starts "streaming backup" phase with backup_total = 0. Coudln't > we move to that phase after setting backup total and tablespace total? > That is, just after calling SendBackupHeader(). OK, that's a bit less confusing for users. I will change in that way. > + WHEN 3 THEN 'stopping backup'::text > > I'm not sure, but the "stop" seems suggesting the backup is terminated > before completion. If it is following the name of the function > pg_stop_backup, I think the name is suggesting to stop "the state for > performing backup", not a backup. > > In the first place, the progress is about "backup" so it seems strange > that we have another phase after the "stopping backup" phase. It > might be better that it is "finishing file transfer" or such. > > "initializing" > -> "starting file transfer" > -> "transferring files" > -> "finishing file transfer" > -> "transaferring WAL" Better name is always welcome! If "stopping back" is confusing, what about "performing pg_stop_backup"? So initializing performing pg_start_backup streaming database files performing pg_stop_backup transfering WAL files Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Masahiko Sawada
Date:
On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/01/30 12:58, Kyotaro Horiguchi wrote: > > At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >> Hi, > >> > >> pg_basebackup reports the backup progress if --progress option is > >> specified, > >> and we can monitor it in the client side. I think that it's useful if > >> we can > >> monitor the progress information also in the server side because, for > >> example, > >> we can easily check that by using SQL. So I'd like to propose > >> pg_stat_progress_basebackup view that allows us to monitor the > >> progress > >> of pg_basebackup, in the server side. Thought? > >> > >> POC patch is attached. > > > > | postgres=# \d pg_stat_progress_basebackup > > | View "pg_catalog.pg_stat_progress_basebackup" > > | Column | Type | Collation | Nullable | Default > > | ---------------------+---------+-----------+----------+--------- > > | pid | integer | | | > > | phase | text | | | > > | backup_total | bigint | | | > > | backup_streamed | bigint | | | > > | tablespace_total | bigint | | | > > | tablespace_streamed | bigint | | | > > > > I think the view needs client identity such like host/port pair > > besides pid (host/port pair fails identify client in the case of > > unix-sockets.). > > I don't think this is job of a progress reporting. If those information > is necessary, we can join this view with pg_stat_replication using > pid column as the join key. > > > Also elapsed time from session start might be > > useful. > > +1 > I think that not only pg_stat_progress_basebackup but also all the other > progress views should report the time when the target command was > started and the time when the phase was last changed. Those times > would be useful to estimate the remaining execution time from the > progress infromation. > > > pg_stat_progress_acuum has datid, datname and relid. > > > > + if (backup_total > 0 && backup_streamed > backup_total) > > + { > > + backup_total = backup_streamed; > > > > Do we need the condition "backup_total > 0"? > > I added the condition for the case where --progress option is not supplied > in pg_basebackup, i.e., the case where the total amount of backup is not > estimated at the beginning of the backup. In this case, total_backup is > always 0. > > > + int tblspc_streamed = 0; > > + > > + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, > > + PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); > > > > This starts "streaming backup" phase with backup_total = 0. Coudln't > > we move to that phase after setting backup total and tablespace total? > > That is, just after calling SendBackupHeader(). > > OK, that's a bit less confusing for users. I will change in that way. > > > + WHEN 3 THEN 'stopping backup'::text > > > > I'm not sure, but the "stop" seems suggesting the backup is terminated > > before completion. If it is following the name of the function > > pg_stop_backup, I think the name is suggesting to stop "the state for > > performing backup", not a backup. > > > > In the first place, the progress is about "backup" so it seems strange > > that we have another phase after the "stopping backup" phase. It > > might be better that it is "finishing file transfer" or such. > > > > "initializing" > > -> "starting file transfer" > > -> "transferring files" > > -> "finishing file transfer" > > -> "transaferring WAL" > > Better name is always welcome! If "stopping back" is confusing, > what about "performing pg_stop_backup"? So > > initializing > performing pg_start_backup > streaming database files > performing pg_stop_backup > transfering WAL files Another idea I came up with is to show steps that take time instead of pg_start_backup/pg_stop_backup, for better understanding the situation. That is, "performing checkpoint" and "performing WAL archive" etc, which engage the most of the time of these functions. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/02 14:59, Masahiko Sawada wrote: > On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/01/30 12:58, Kyotaro Horiguchi wrote: >>> At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> Hi, >>>> >>>> pg_basebackup reports the backup progress if --progress option is >>>> specified, >>>> and we can monitor it in the client side. I think that it's useful if >>>> we can >>>> monitor the progress information also in the server side because, for >>>> example, >>>> we can easily check that by using SQL. So I'd like to propose >>>> pg_stat_progress_basebackup view that allows us to monitor the >>>> progress >>>> of pg_basebackup, in the server side. Thought? >>>> >>>> POC patch is attached. >>> >>> | postgres=# \d pg_stat_progress_basebackup >>> | View "pg_catalog.pg_stat_progress_basebackup" >>> | Column | Type | Collation | Nullable | Default >>> | ---------------------+---------+-----------+----------+--------- >>> | pid | integer | | | >>> | phase | text | | | >>> | backup_total | bigint | | | >>> | backup_streamed | bigint | | | >>> | tablespace_total | bigint | | | >>> | tablespace_streamed | bigint | | | >>> >>> I think the view needs client identity such like host/port pair >>> besides pid (host/port pair fails identify client in the case of >>> unix-sockets.). >> >> I don't think this is job of a progress reporting. If those information >> is necessary, we can join this view with pg_stat_replication using >> pid column as the join key. >> >>> Also elapsed time from session start might be >>> useful. >> >> +1 >> I think that not only pg_stat_progress_basebackup but also all the other >> progress views should report the time when the target command was >> started and the time when the phase was last changed. Those times >> would be useful to estimate the remaining execution time from the >> progress infromation. >> >>> pg_stat_progress_acuum has datid, datname and relid. >>> >>> + if (backup_total > 0 && backup_streamed > backup_total) >>> + { >>> + backup_total = backup_streamed; >>> >>> Do we need the condition "backup_total > 0"? >> >> I added the condition for the case where --progress option is not supplied >> in pg_basebackup, i.e., the case where the total amount of backup is not >> estimated at the beginning of the backup. In this case, total_backup is >> always 0. >> >>> + int tblspc_streamed = 0; >>> + >>> + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, >>> + PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); >>> >>> This starts "streaming backup" phase with backup_total = 0. Coudln't >>> we move to that phase after setting backup total and tablespace total? >>> That is, just after calling SendBackupHeader(). >> >> OK, that's a bit less confusing for users. I will change in that way. Fixed. Attached is the updated version of the patch. I also fixed the regression test failure. >> >>> + WHEN 3 THEN 'stopping backup'::text >>> >>> I'm not sure, but the "stop" seems suggesting the backup is terminated >>> before completion. If it is following the name of the function >>> pg_stop_backup, I think the name is suggesting to stop "the state for >>> performing backup", not a backup. >>> >>> In the first place, the progress is about "backup" so it seems strange >>> that we have another phase after the "stopping backup" phase. It >>> might be better that it is "finishing file transfer" or such. >>> >>> "initializing" >>> -> "starting file transfer" >>> -> "transferring files" >>> -> "finishing file transfer" >>> -> "transaferring WAL" >> >> Better name is always welcome! If "stopping back" is confusing, >> what about "performing pg_stop_backup"? So >> >> initializing >> performing pg_start_backup >> streaming database files >> performing pg_stop_backup >> transfering WAL files > > Another idea I came up with is to show steps that take time instead of > pg_start_backup/pg_stop_backup, for better understanding the > situation. That is, "performing checkpoint" and "performing WAL > archive" etc, which engage the most of the time of these functions. Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds better than "performing WAL archive". Thought? I've not applied this change in the patch yet, but if there is no other idea, I'd like to adopt this. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/02/02 14:59, Masahiko Sawada wrote: > > On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> On 2020/01/30 12:58, Kyotaro Horiguchi wrote: > >>> + WHEN 3 THEN 'stopping backup'::text > >>> > >>> I'm not sure, but the "stop" seems suggesting the backup is terminated > >>> before completion. If it is following the name of the function > >>> pg_stop_backup, I think the name is suggesting to stop "the state for > >>> performing backup", not a backup. > >>> > >>> In the first place, the progress is about "backup" so it seems strange > >>> that we have another phase after the "stopping backup" phase. It > >>> might be better that it is "finishing file transfer" or such. > >>> > >>> "initializing" > >>> -> "starting file transfer" > >>> -> "transferring files" > >>> -> "finishing file transfer" > >>> -> "transaferring WAL" > >> > >> Better name is always welcome! If "stopping back" is confusing, > >> what about "performing pg_stop_backup"? So > >> > >> initializing > >> performing pg_start_backup > >> streaming database files > >> performing pg_stop_backup > >> transfering WAL files > > > > Another idea I came up with is to show steps that take time instead of > > pg_start_backup/pg_stop_backup, for better understanding the > > situation. That is, "performing checkpoint" and "performing WAL > > archive" etc, which engage the most of the time of these functions. > > Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds > better than "performing WAL archive". Thought? > I've not applied this change in the patch yet, but if there is no > other idea, I'd like to adopt this. If we are trying to "pg_stop_backup" in phase name, maybe we should avoid "pg_start_backup"? Then maybe: initializing starting backup / waiting for [ backup start ] checkpoint to finish transferring database files finishing backup / waiting for WAL archiving to finish transferring WAL files ? Some comments on documentation changes in v2 patch: + Amount of data already streamed. "already" may be redundant. For example, in pg_start_progress_vacuum, heap_blks_scanned is described as "...blocks scanned", not "...blocks already scanned". + <entry><structfield>tablespace_total</structfield></entry> + <entry><structfield>tablespace_streamed</structfield></entry> Better to use plural tablespaces_total and tablespaces_streamed for consistency? + The WAL sender process is currently performing + <function>pg_start_backup</function> and setting up for + making a base backup. How about "taking" instead of "making" in the above sentence? - <varlistentry> + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> I don't see any new text in the documentation patch that uses above xref, so no need to define it? Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Mon, Feb 3, 2020 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote: > If we are trying to "pg_stop_backup" in phase name, maybe we should > avoid "pg_start_backup"? Then maybe: Sorry, I meant to write: If we are trying to avoid "pg_stop_backup" in phase name, maybe we should avoid "pg_start_backup"? Then maybe: Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/03 16:28, Amit Langote wrote: > On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> On 2020/02/02 14:59, Masahiko Sawada wrote: >>> On Fri, 31 Jan 2020 at 02:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> On 2020/01/30 12:58, Kyotaro Horiguchi wrote: >>>>> + WHEN 3 THEN 'stopping backup'::text >>>>> >>>>> I'm not sure, but the "stop" seems suggesting the backup is terminated >>>>> before completion. If it is following the name of the function >>>>> pg_stop_backup, I think the name is suggesting to stop "the state for >>>>> performing backup", not a backup. >>>>> >>>>> In the first place, the progress is about "backup" so it seems strange >>>>> that we have another phase after the "stopping backup" phase. It >>>>> might be better that it is "finishing file transfer" or such. >>>>> >>>>> "initializing" >>>>> -> "starting file transfer" >>>>> -> "transferring files" >>>>> -> "finishing file transfer" >>>>> -> "transaferring WAL" >>>> >>>> Better name is always welcome! If "stopping back" is confusing, >>>> what about "performing pg_stop_backup"? So >>>> >>>> initializing >>>> performing pg_start_backup >>>> streaming database files >>>> performing pg_stop_backup >>>> transfering WAL files >>> >>> Another idea I came up with is to show steps that take time instead of >>> pg_start_backup/pg_stop_backup, for better understanding the >>> situation. That is, "performing checkpoint" and "performing WAL >>> archive" etc, which engage the most of the time of these functions. >> >> Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds >> better than "performing WAL archive". Thought? >> I've not applied this change in the patch yet, but if there is no >> other idea, I'd like to adopt this. > > If we are trying to "pg_stop_backup" in phase name, maybe we should > avoid "pg_start_backup"? Then maybe: > > initializing > starting backup / waiting for [ backup start ] checkpoint to finish > transferring database files > finishing backup / waiting for WAL archiving to finish > transferring WAL files So we now have the following ideas about the phase names for pg_basebackup. 1. initializing 2. 2-1. starting backup 2-2. starting file transfer 2-3. performing pg_start_backup 2-4. performing checkpoint 2-5. waiting for [ backup start ] checkpoint to finish 3. 3-1. streaming backup 3-2. transferring database files 3-3. streaming database files 3-4. transferring files 4. 4-1. stopping backup 4-2. finishing file transfer 4-3. performing pg_stop_backup 4-4. finishing backup 4-5. waiting for WAL archiving to finish 4-6. performing WAL archive 5. 1. transferring wal 2. transferring WAL files What conbination of these do you prefer? > Some comments on documentation changes in v2 patch: > > + Amount of data already streamed. Ok, fixed. > "already" may be redundant. For example, in pg_start_progress_vacuum, > heap_blks_scanned is described as "...blocks scanned", not "...blocks > already scanned". > > + <entry><structfield>tablespace_total</structfield></entry> > + <entry><structfield>tablespace_streamed</structfield></entry> > > Better to use plural tablespaces_total and tablespaces_streamed for consistency? Fixed. > + The WAL sender process is currently performing > + <function>pg_start_backup</function> and setting up for > + making a base backup. > > How about "taking" instead of "making" in the above sentence? Fixed. Attached is the updated version of the patch. > > - <varlistentry> > + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> > > I don't see any new text in the documentation patch that uses above > xref, so no need to define it? The following description that I added uses this. certain commands during command execution. Currently, the only commands which support progress reporting are <command>ANALYZE</command>, <command>CLUSTER</command>, - <command>CREATE INDEX</command>, and <command>VACUUM</command>. + <command>CREATE INDEX</command>, <command>VACUUM</command>, + and <xref linkend="protocol-replication-base-backup"/> (i.e., replication + command that <xref linkend="app-pgbasebackup"/> issues to take + a base backup). Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Mon, Feb 3, 2020 at 11:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > So we now have the following ideas about the phase names for pg_basebackup. > > 1. > initializing > > 2. > 2-1. starting backup > 2-2. starting file transfer > 2-3. performing pg_start_backup > 2-4. performing checkpoint > 2-5. waiting for [ backup start ] checkpoint to finish > > 3. > 3-1. streaming backup > 3-2. transferring database files > 3-3. streaming database files > 3-4. transferring files > > 4. > 4-1. stopping backup > 4-2. finishing file transfer > 4-3. performing pg_stop_backup > 4-4. finishing backup > 4-5. waiting for WAL archiving to finish > 4-6. performing WAL archive > > 5. > 1. transferring wal > 2. transferring WAL files > > What conbination of these do you prefer? I like: 1. initializing 2-5 waiting for backup start checkpoint to finish 3-3 streaming database files 4-5 waiting for wal archiving to finish 5-1 transferring wal (or streaming wal) > > - <varlistentry> > > + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> > > > > I don't see any new text in the documentation patch that uses above > > xref, so no need to define it? > > The following description that I added uses this. > > certain commands during command execution. Currently, the only commands > which support progress reporting are <command>ANALYZE</command>, > <command>CLUSTER</command>, > - <command>CREATE INDEX</command>, and <command>VACUUM</command>. > + <command>CREATE INDEX</command>, <command>VACUUM</command>, > + and <xref linkend="protocol-replication-base-backup"/> (i.e., replication > + command that <xref linkend="app-pgbasebackup"/> issues to take > + a base backup). Sorry, I missed that. I was mistakenly expecting a different value of linkend. Some comments on v3: + <entry>Process ID of a WAL sender process.</entry> "a" sounds redundant. Maybe: of this WAL sender process or of WAL sender process Reading this: + <entry><structfield>backup_total</structfield></entry> + <entry><type>bigint</type></entry> + <entry> + Total amount of data that will be streamed. If progress reporting + is not enabled in <application>pg_basebackup</application> + (i.e., <literal>--progress</literal> option is not specified), + this is <literal>0</literal>. I wonder how hard would it be to change basebackup.c to always set backup_total, irrespective of whether --progress is specified with pg_basebackup or not? It seems quite misleading to leave it set to 0, because one may panic that they have lost their data, that is, if they haven't first read the documentation. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Tue, Feb 4, 2020 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote: > Reading this: > > + <entry><structfield>backup_total</structfield></entry> > + <entry><type>bigint</type></entry> > + <entry> > + Total amount of data that will be streamed. If progress reporting > + is not enabled in <application>pg_basebackup</application> > + (i.e., <literal>--progress</literal> option is not specified), > + this is <literal>0</literal>. > > I wonder how hard would it be to change basebackup.c to always set > backup_total, irrespective of whether --progress is specified with > pg_basebackup or not? It seems quite misleading to leave it set to 0, > because one may panic that they have lost their data, that is, if they > haven't first read the documentation. For example, the attached patch changes basebackup.c to always set tablespaceinfo.size, irrespective of whether --progress was passed with BASE_BACKUP command. It passes make check-world, so maybe safe. Maybe it would be a good idea to add a couple of more comments around tablespaceinfo struct definition, such as how 'size' is to be interpreted. Thanks, Amit
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/04 10:34, Amit Langote wrote: > On Mon, Feb 3, 2020 at 11:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> So we now have the following ideas about the phase names for pg_basebackup. >> >> 1. >> initializing >> >> 2. >> 2-1. starting backup >> 2-2. starting file transfer >> 2-3. performing pg_start_backup >> 2-4. performing checkpoint >> 2-5. waiting for [ backup start ] checkpoint to finish >> >> 3. >> 3-1. streaming backup >> 3-2. transferring database files >> 3-3. streaming database files >> 3-4. transferring files >> >> 4. >> 4-1. stopping backup >> 4-2. finishing file transfer >> 4-3. performing pg_stop_backup >> 4-4. finishing backup >> 4-5. waiting for WAL archiving to finish >> 4-6. performing WAL archive >> >> 5. >> 1. transferring wal >> 2. transferring WAL files >> >> What conbination of these do you prefer? > > I like: Thanks for reviewing the patch! > 1. initializing > 2-5 waiting for backup start checkpoint to finish Can we shorten this to "waiting for checkpoint"? IMO the simpler phase name is better and "to finish" sounds a bit redundant. Also in the description of pg_stat_progress_create_index, basically "waiting for xxx" is used. > 3-3 streaming database files > 4-5 waiting for wal archiving to finish Can we shorten this to "waiting for wal archiving" because of the same reason as above? > 5-1 transferring wal (or streaming wal) IMO "transferring wal" is better because this phase happens only when "--wal-method=fetch" is specified in pg_basebackup. "streaming wal" seems to implie "--wal-method=stream", instead. >>> - <varlistentry> >>> + <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> >>> >>> I don't see any new text in the documentation patch that uses above >>> xref, so no need to define it? >> >> The following description that I added uses this. >> >> certain commands during command execution. Currently, the only commands >> which support progress reporting are <command>ANALYZE</command>, >> <command>CLUSTER</command>, >> - <command>CREATE INDEX</command>, and <command>VACUUM</command>. >> + <command>CREATE INDEX</command>, <command>VACUUM</command>, >> + and <xref linkend="protocol-replication-base-backup"/> (i.e., replication >> + command that <xref linkend="app-pgbasebackup"/> issues to take >> + a base backup). > > Sorry, I missed that. I was mistakenly expecting a different value of linkend. > > Some comments on v3: > > + <entry>Process ID of a WAL sender process.</entry> > > "a" sounds redundant. Maybe: > > of this WAL sender process or > of WAL sender process I borrowed "Process ID of a WAL sender process" from the description of pg_stat_replication.pid. But if it's better to get rid of "a", I'm happy to do that! > Reading this: > > + <entry><structfield>backup_total</structfield></entry> > + <entry><type>bigint</type></entry> > + <entry> > + Total amount of data that will be streamed. If progress reporting > + is not enabled in <application>pg_basebackup</application> > + (i.e., <literal>--progress</literal> option is not specified), > + this is <literal>0</literal>. > > I wonder how hard would it be to change basebackup.c to always set > backup_total, irrespective of whether --progress is specified with > pg_basebackup or not? It seems quite misleading to leave it set to 0, > because one may panic that they have lost their data, that is, if they > haven't first read the documentation. Yeah, I understand your concern. The pg_basebackup document explains the risk when --progress is specified, as follows. Since I imagined that someone may explicitly disable --progress to avoid this risk, I made the server estimate the total size only when --progress is specified. But you think that this overhead by --progress is negligibly small? -------------------- When this is enabled, the backup will start by enumerating the size of the entire database, and then go back and send the actual contents. This may make the backup take slightly longer, and in particular it will take longer before the first data is sent. -------------------- If we really can always estimate the total size, whether --progress is specified or not, we should get rid of PROGRESS option from BASE_BACKUP replication command because it will no longer be necessary, I think. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/02/04 10:34, Amit Langote wrote: > > I like: > > Thanks for reviewing the patch! > > > 1. initializing > > 2-5 waiting for backup start checkpoint to finish > > Can we shorten this to "waiting for checkpoint"? IMO the simpler > phase name is better and "to finish" sounds a bit redundant. Also > in the description of pg_stat_progress_create_index, basically > "waiting for xxx" is used. "waiting for checkpoint" works for me. > > 3-3 streaming database files > > 4-5 waiting for wal archiving to finish > > Can we shorten this to "waiting for wal archiving" because of > the same reason as above? Yes. > > 5-1 transferring wal (or streaming wal) > > IMO "transferring wal" is better because this phase happens only when > "--wal-method=fetch" is specified in pg_basebackup. "streaming wal" > seems to implie "--wal-method=stream", instead. Ah, okay, > > Reading this: > > > > + <entry><structfield>backup_total</structfield></entry> > > + <entry><type>bigint</type></entry> > > + <entry> > > + Total amount of data that will be streamed. If progress reporting > > + is not enabled in <application>pg_basebackup</application> > > + (i.e., <literal>--progress</literal> option is not specified), > > + this is <literal>0</literal>. > > > > I wonder how hard would it be to change basebackup.c to always set > > backup_total, irrespective of whether --progress is specified with > > pg_basebackup or not? It seems quite misleading to leave it set to 0, > > because one may panic that they have lost their data, that is, if they > > haven't first read the documentation. > > Yeah, I understand your concern. The pg_basebackup document explains > the risk when --progress is specified, as follows. Since I imagined that > someone may explicitly disable --progress to avoid this risk, I made > the server estimate the total size only when --progress is specified. > But you think that this overhead by --progress is negligibly small? > > -------------------- > When this is enabled, the backup will start by enumerating the size of > the entire database, and then go back and send the actual contents. > This may make the backup take slightly longer, and in particular it will > take longer before the first data is sent. > -------------------- Sorry, I hadn't read this before. So, my proposal would make this a lie. Still, if "streaming database files" is the longest phase, then not having even an approximation of how much data is to be streamed over doesn't much help estimating progress, at least as long as one only has this view to look at. That said, the overhead of checking the size before sending any data may be worse for some people than others, so having the option to avoid that might be good after all. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09@gmail.com> wrote in > On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/02/04 10:34, Amit Langote wrote: > > > I like: > > > > Thanks for reviewing the patch! > > > > > 1. initializing > > > 2-5 waiting for backup start checkpoint to finish > > > > Can we shorten this to "waiting for checkpoint"? IMO the simpler > > phase name is better and "to finish" sounds a bit redundant. Also > > in the description of pg_stat_progress_create_index, basically > > "waiting for xxx" is used. > > "waiting for checkpoint" works for me. I'm not sure, but doesn't that mean "waiting for a checkpoint to start"? Sorry in advance if that is not the case. > > > 3-3 streaming database files > > > 4-5 waiting for wal archiving to finish > > > > Can we shorten this to "waiting for wal archiving" because of > > the same reason as above? > > Yes. > > > > 5-1 transferring wal (or streaming wal) > > > > IMO "transferring wal" is better because this phase happens only when > > "--wal-method=fetch" is specified in pg_basebackup. "streaming wal" > > seems to implie "--wal-method=stream", instead. > > Ah, okay, > > > > Reading this: > > > > > > + <entry><structfield>backup_total</structfield></entry> > > > + <entry><type>bigint</type></entry> > > > + <entry> > > > + Total amount of data that will be streamed. If progress reporting > > > + is not enabled in <application>pg_basebackup</application> > > > + (i.e., <literal>--progress</literal> option is not specified), > > > + this is <literal>0</literal>. > > > > > > I wonder how hard would it be to change basebackup.c to always set > > > backup_total, irrespective of whether --progress is specified with > > > pg_basebackup or not? It seems quite misleading to leave it set to 0, > > > because one may panic that they have lost their data, that is, if they > > > haven't first read the documentation. > > > > Yeah, I understand your concern. The pg_basebackup document explains > > the risk when --progress is specified, as follows. Since I imagined that > > someone may explicitly disable --progress to avoid this risk, I made > > the server estimate the total size only when --progress is specified. > > But you think that this overhead by --progress is negligibly small? > > > > -------------------- > > When this is enabled, the backup will start by enumerating the size of > > the entire database, and then go back and send the actual contents. > > This may make the backup take slightly longer, and in particular it will > > take longer before the first data is sent. > > -------------------- > > Sorry, I hadn't read this before. So, my proposal would make this a lie. > > Still, if "streaming database files" is the longest phase, then not > having even an approximation of how much data is to be streamed over > doesn't much help estimating progress, at least as long as one only > has this view to look at. > > That said, the overhead of checking the size before sending any data > may be worse for some people than others, so having the option to > avoid that might be good after all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Feb 5, 2020 at 5:32 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Wed, 5 Feb 2020 16:29:54 +0900, Amit Langote <amitlangote09@gmail.com> wrote in > > On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > On 2020/02/04 10:34, Amit Langote wrote: > > > > I like: > > > > > > Thanks for reviewing the patch! > > > > > > > 1. initializing > > > > 2-5 waiting for backup start checkpoint to finish > > > > > > Can we shorten this to "waiting for checkpoint"? IMO the simpler > > > phase name is better and "to finish" sounds a bit redundant. Also > > > in the description of pg_stat_progress_create_index, basically > > > "waiting for xxx" is used. > > > > "waiting for checkpoint" works for me. > > I'm not sure, but doesn't that mean "waiting for a checkpoint to > start"? Sorry in advance if that is not the case. No, I really meant "to finish". As Sawada-san said upthread, we should really use text that describes the activity that usually takes long. While it takes takes only a moment to actually start the checkpoint, it might take long for it to finish. As Fujii-san says though we don't need the noise words "to finish". Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Feb 5, 2020 at 6:15 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> 2020年2月5日(水) 17:54 Amit Langote <amitlangote09@gmail.com>:
>>
>> > I'm not sure, but doesn't that mean "waiting for a checkpoint to
>> > start"? Sorry in advance if that is not the case.
>>
>> No, I really meant "to finish". As Sawada-san said upthread, we
>> should really use text that describes the activity that usually takes
>> long. While it takes takes only a moment to actually start the
>> checkpoint, it might take long for it to finish.
>
> I meant that the wording might sound as if it implies "to start", but..
Ah, I misunderstood then, sorry.
So, maybe you're saying that "waiting for checkpoint" is ambiguous and
most people will assume it means "...to start". As for me, I assume
it ends with "...to finish".
>> As Fujii-san says
>> though we don't need the noise words "to finish".
>
> Understood, sorry for my noise.
Actually, that's an important point to consider and we should strive
to use words that are unambiguous.
Last two messages weren’t sent to the list.
Thanks,
Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Wed, 5 Feb 2020 18:53:19 +0900, Amit Langote <amitlangote09@gmail.com> wrote in > Last two messages weren’t sent to the list. Oops! Sorry, I made a mistake sending the mail. > On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote: > > > On Wed, Feb 5, 2020 at 6:15 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > 2020年2月5日(水) 17:54 Amit Langote <amitlangote09@gmail.com>: > > >> > > So, maybe you're saying that "waiting for checkpoint" is ambiguous and > > most people will assume it means "...to start". As for me, I assume > > it ends with "...to finish". I'm not sure "most peple will assume" or not, so I said "I'm not sure". For example, I feel strangeness to use "I'm waiting for Amit" to express that I'm waiting Amit to leave there. That phrase gives me such kind of uneasiness. I thought of "establishing checkpoint" or "running a checkpoint" as other candidates. > > >> As Fujii-san says > > >> though we don't need the noise words "to finish". > > > > > > Understood, sorry for my noise. > > > > Actually, that's an important point to consider and we should strive > > to use words that are unambiguous. I think it's not ambiguous if knowing what happens during backup so my concern was not unambiguity, but that it might give feeling of strangeness that that sentense appears in that context. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote: > > > So, maybe you're saying that "waiting for checkpoint" is ambiguous and > > > most people will assume it means "...to start". As for me, I assume > > > it ends with "...to finish". > > I'm not sure "most peple will assume" or not, so I said "I'm not > sure". For example, I feel strangeness to use "I'm waiting for Amit" > to express that I'm waiting Amit to leave there. That phrase gives me > such kind of uneasiness. > > I thought of "establishing checkpoint" or "running a checkpoint" as > other candidates. Okay, I understand. I am fine with "running checkpoint", although I think "waiting for checkpoint" isn't totally wrong either. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Yeah, I understand your concern. The pg_basebackup document explains > > the risk when --progress is specified, as follows. Since I imagined that > > someone may explicitly disable --progress to avoid this risk, I made > > the server estimate the total size only when --progress is specified. > > But you think that this overhead by --progress is negligibly small? > > > > -------------------- > > When this is enabled, the backup will start by enumerating the size of > > the entire database, and then go back and send the actual contents. > > This may make the backup take slightly longer, and in particular it will > > take longer before the first data is sent. > > -------------------- > > Sorry, I hadn't read this before. So, my proposal would make this a lie. > > Still, if "streaming database files" is the longest phase, then not > having even an approximation of how much data is to be streamed over > doesn't much help estimating progress, at least as long as one only > has this view to look at. > > That said, the overhead of checking the size before sending any data > may be worse for some people than others, so having the option to > avoid that might be good after all. By the way, if calculating backup total size can take significantly long in some cases, that is when requested by specifying --progress, then it might be a good idea to define a separate phase for that, like "estimating backup size" or some such. Currently, it's part of "starting backup", which covers both running the checkpoint and size estimation which run one after another. I suspect people might never get stuck on "estimating backup size" as they might on "running checkpoint", which perhaps only strengthens the case for *always* calculating the size before sending the backup header. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/06 11:07, Amit Langote wrote: > On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >>> On Wed, Feb 5, 2020 at 18:25 Amit Langote <amitlangote09@gmail.com> wrote: >>>> So, maybe you're saying that "waiting for checkpoint" is ambiguous and >>>> most people will assume it means "...to start". As for me, I assume >>>> it ends with "...to finish". >> >> I'm not sure "most peple will assume" or not, so I said "I'm not >> sure". For example, I feel strangeness to use "I'm waiting for Amit" >> to express that I'm waiting Amit to leave there. That phrase gives me >> such kind of uneasiness. >> >> I thought of "establishing checkpoint" or "running a checkpoint" as >> other candidates. > > Okay, I understand. I am fine with "running checkpoint", although I > think "waiting for checkpoint" isn't totally wrong either. Yeah, but if "waiting for XXX" sounds a bit confusing to some people, I'm OK to back to "waiting for XXX to finish" that you originally proposed. Attached the updated version of the patch. This patch uses the following descriptions of the phases. waiting for checkpoint to finish estimating backup size streaming database files waiting for wal archiving to finish transferring wal files Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/06 11:35, Amit Langote wrote: > On Wed, Feb 5, 2020 at 4:29 PM Amit Langote <amitlangote09@gmail.com> wrote: >> On Wed, Feb 5, 2020 at 3:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> Yeah, I understand your concern. The pg_basebackup document explains >>> the risk when --progress is specified, as follows. Since I imagined that >>> someone may explicitly disable --progress to avoid this risk, I made >>> the server estimate the total size only when --progress is specified. >>> But you think that this overhead by --progress is negligibly small? >>> >>> -------------------- >>> When this is enabled, the backup will start by enumerating the size of >>> the entire database, and then go back and send the actual contents. >>> This may make the backup take slightly longer, and in particular it will >>> take longer before the first data is sent. >>> -------------------- >> >> Sorry, I hadn't read this before. So, my proposal would make this a lie. >> >> Still, if "streaming database files" is the longest phase, then not >> having even an approximation of how much data is to be streamed over >> doesn't much help estimating progress, at least as long as one only >> has this view to look at. >> >> That said, the overhead of checking the size before sending any data >> may be worse for some people than others, so having the option to >> avoid that might be good after all. > > By the way, if calculating backup total size can take significantly > long in some cases, that is when requested by specifying --progress, > then it might be a good idea to define a separate phase for that, like > "estimating backup size" or some such. Currently, it's part of > "starting backup", which covers both running the checkpoint and size > estimation which run one after another. OK, I added this phase in the latest patch that I posted upthread. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Mon, Feb 17, 2020 at 10:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/02/06 11:07, Amit Langote wrote: > > On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > >> I thought of "establishing checkpoint" or "running a checkpoint" as > >> other candidates. > > > > Okay, I understand. I am fine with "running checkpoint", although I > > think "waiting for checkpoint" isn't totally wrong either. > > Yeah, but if "waiting for XXX" sounds a bit confusing to some people, > I'm OK to back to "waiting for XXX to finish" that you originally > proposed. > > Attached the updated version of the patch. This patch uses the following > descriptions of the phases. > > waiting for checkpoint to finish > estimating backup size > streaming database files > waiting for wal archiving to finish > transferring wal files Thanks for the new patch. I noticed that there is missing </para> tag in the documentation changes: + <row> + <entry><literal>waiting for checkpoint to finish</literal></entry> + <entry> + The WAL sender process is currently performing + <function>pg_start_backup</function> to set up for + taking a base backup, and waiting for backup start + checkpoint to finish. + </entry> + <row> There should be a </row> between </entry> and <row> at the end of the hunk shown above. Sorry for not saying it before, but the following text needs revisiting: + <para> + Whenever <application>pg_basebackup</application> is taking a base + backup, the <structname>pg_stat_progress_basebackup</structname> + view will contain a row for each WAL sender process that is currently + running <command>BASE_BACKUP</command> replication command + and streaming the backup. I understand that you wrote "Whenever pg_basebackup is taking a backup...", because description of other views contains a similar starting line. But, it may not only be pg_basebackup that would be served by this view, no? It could be any tool that speaks Postgres' replication protocol and thus be able to send a BASE_BACKUP command. If that is correct, I would write something like "When an application is taking a backup" or some such without specific reference to pg_basebackup. Thoughts? Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/18 16:02, Amit Langote wrote: > On Mon, Feb 17, 2020 at 10:00 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> On 2020/02/06 11:07, Amit Langote wrote: >>> On Thu, Feb 6, 2020 at 9:51 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >>>> I thought of "establishing checkpoint" or "running a checkpoint" as >>>> other candidates. >>> >>> Okay, I understand. I am fine with "running checkpoint", although I >>> think "waiting for checkpoint" isn't totally wrong either. >> >> Yeah, but if "waiting for XXX" sounds a bit confusing to some people, >> I'm OK to back to "waiting for XXX to finish" that you originally >> proposed. >> >> Attached the updated version of the patch. This patch uses the following >> descriptions of the phases. >> >> waiting for checkpoint to finish >> estimating backup size >> streaming database files >> waiting for wal archiving to finish >> transferring wal files > > Thanks for the new patch. Thanks for reviewing the patch! > I noticed that there is missing </para> tag in the documentation changes: Could you tell me where I should add </para> tag? > + <row> > + <entry><literal>waiting for checkpoint to finish</literal></entry> > + <entry> > + The WAL sender process is currently performing > + <function>pg_start_backup</function> to set up for > + taking a base backup, and waiting for backup start > + checkpoint to finish. > + </entry> > + <row> > > There should be a </row> between </entry> and <row> at the end of the > hunk shown above. Will fix. Thanks! > Sorry for not saying it before, but the following text needs revisiting: Of course, no problem. I'm happy to improve the patch! > + <para> > + Whenever <application>pg_basebackup</application> is taking a base > + backup, the <structname>pg_stat_progress_basebackup</structname> > + view will contain a row for each WAL sender process that is currently > + running <command>BASE_BACKUP</command> replication command > + and streaming the backup. > > I understand that you wrote "Whenever pg_basebackup is taking a > backup...", because description of other views contains a similar > starting line. But, it may not only be pg_basebackup that would be > served by this view, no? It could be any tool that speaks Postgres' > replication protocol and thus be able to send a BASE_BACKUP command. > If that is correct, I would write something like "When an application > is taking a backup" or some such without specific reference to > pg_basebackup. Thoughts? Yeah, there may be some such applications. But most users would use pg_basebackup, so getting rid of the reference to pg_basebackup would make the description a bit difficult-to-read. Also I can imagine that an user of those backup applications would get to know the progress reporting view from their documents. So I prefer the existing one or something like "Whenever an application like pg_basebackup ...". Thought? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/02/18 16:02, Amit Langote wrote: > > I noticed that there is missing </para> tag in the documentation changes: > > Could you tell me where I should add </para> tag? > > > + <row> > > + <entry><literal>waiting for checkpoint to finish</literal></entry> > > + <entry> > > + The WAL sender process is currently performing > > + <function>pg_start_backup</function> to set up for > > + taking a base backup, and waiting for backup start > > + checkpoint to finish. > > + </entry> > > + <row> > > > > There should be a </row> between </entry> and <row> at the end of the > > hunk shown above. > > Will fix. Thanks! Just to clarify, that's the missing </para> tag I am talking about above. > > + <para> > > + Whenever <application>pg_basebackup</application> is taking a base > > + backup, the <structname>pg_stat_progress_basebackup</structname> > > + view will contain a row for each WAL sender process that is currently > > + running <command>BASE_BACKUP</command> replication command > > + and streaming the backup. > > > > I understand that you wrote "Whenever pg_basebackup is taking a > > backup...", because description of other views contains a similar > > starting line. But, it may not only be pg_basebackup that would be > > served by this view, no? It could be any tool that speaks Postgres' > > replication protocol and thus be able to send a BASE_BACKUP command. > > If that is correct, I would write something like "When an application > > is taking a backup" or some such without specific reference to > > pg_basebackup. Thoughts? > > Yeah, there may be some such applications. But most users would > use pg_basebackup, so getting rid of the reference to pg_basebackup > would make the description a bit difficult-to-read. Also I can imagine > that an user of those backup applications would get to know > the progress reporting view from their documents. So I prefer > the existing one or something like "Whenever an application like > pg_basebackup ...". Thought? Sure, "an application like pg_basebackup" sounds fine to me. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/18 16:53, Amit Langote wrote: > On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> On 2020/02/18 16:02, Amit Langote wrote: >>> I noticed that there is missing </para> tag in the documentation changes: >> >> Could you tell me where I should add </para> tag? >> >>> + <row> >>> + <entry><literal>waiting for checkpoint to finish</literal></entry> >>> + <entry> >>> + The WAL sender process is currently performing >>> + <function>pg_start_backup</function> to set up for >>> + taking a base backup, and waiting for backup start >>> + checkpoint to finish. >>> + </entry> >>> + <row> >>> >>> There should be a </row> between </entry> and <row> at the end of the >>> hunk shown above. >> >> Will fix. Thanks! > > Just to clarify, that's the missing </para> tag I am talking about above. OK, so I added </row> tag just after the above </entry>. >>> + <para> >>> + Whenever <application>pg_basebackup</application> is taking a base >>> + backup, the <structname>pg_stat_progress_basebackup</structname> >>> + view will contain a row for each WAL sender process that is currently >>> + running <command>BASE_BACKUP</command> replication command >>> + and streaming the backup. >>> >>> I understand that you wrote "Whenever pg_basebackup is taking a >>> backup...", because description of other views contains a similar >>> starting line. But, it may not only be pg_basebackup that would be >>> served by this view, no? It could be any tool that speaks Postgres' >>> replication protocol and thus be able to send a BASE_BACKUP command. >>> If that is correct, I would write something like "When an application >>> is taking a backup" or some such without specific reference to >>> pg_basebackup. Thoughts? >> >> Yeah, there may be some such applications. But most users would >> use pg_basebackup, so getting rid of the reference to pg_basebackup >> would make the description a bit difficult-to-read. Also I can imagine >> that an user of those backup applications would get to know >> the progress reporting view from their documents. So I prefer >> the existing one or something like "Whenever an application like >> pg_basebackup ...". Thought? > > Sure, "an application like pg_basebackup" sounds fine to me. OK, I changed the doc that way. Attached the updated version of the patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > OK, I changed the doc that way. Attached the updated version of the patch. Thank you. Looks good to me. Regards, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/19 11:22, Amit Langote wrote: > On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> OK, I changed the doc that way. Attached the updated version of the patch. > > Thank you. Looks good to me. Thanks for the review! You think that the patch can be marked as "ready for committer"? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Feb 19, 2020 at 9:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/02/19 11:22, Amit Langote wrote: > > On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> OK, I changed the doc that way. Attached the updated version of the patch. > > > > Thank you. Looks good to me. > > Thanks for the review! > You think that the patch can be marked as "ready for committer"? As far as I am concerned, yes. :) Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/18 21:31, Fujii Masao wrote: > > > On 2020/02/18 16:53, Amit Langote wrote: >> On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> On 2020/02/18 16:02, Amit Langote wrote: >>>> I noticed that there is missing </para> tag in the documentation changes: >>> >>> Could you tell me where I should add </para> tag? >>> >>>> + <row> >>>> + <entry><literal>waiting for checkpoint to finish</literal></entry> >>>> + <entry> >>>> + The WAL sender process is currently performing >>>> + <function>pg_start_backup</function> to set up for >>>> + taking a base backup, and waiting for backup start >>>> + checkpoint to finish. >>>> + </entry> >>>> + <row> >>>> >>>> There should be a </row> between </entry> and <row> at the end of the >>>> hunk shown above. >>> >>> Will fix. Thanks! >> >> Just to clarify, that's the missing </para> tag I am talking about above. > > OK, so I added </row> tag just after the above </entry>. > >>>> + <para> >>>> + Whenever <application>pg_basebackup</application> is taking a base >>>> + backup, the <structname>pg_stat_progress_basebackup</structname> >>>> + view will contain a row for each WAL sender process that is currently >>>> + running <command>BASE_BACKUP</command> replication command >>>> + and streaming the backup. >>>> >>>> I understand that you wrote "Whenever pg_basebackup is taking a >>>> backup...", because description of other views contains a similar >>>> starting line. But, it may not only be pg_basebackup that would be >>>> served by this view, no? It could be any tool that speaks Postgres' >>>> replication protocol and thus be able to send a BASE_BACKUP command. >>>> If that is correct, I would write something like "When an application >>>> is taking a backup" or some such without specific reference to >>>> pg_basebackup. Thoughts? >>> >>> Yeah, there may be some such applications. But most users would >>> use pg_basebackup, so getting rid of the reference to pg_basebackup >>> would make the description a bit difficult-to-read. Also I can imagine >>> that an user of those backup applications would get to know >>> the progress reporting view from their documents. So I prefer >>> the existing one or something like "Whenever an application like >>> pg_basebackup ...". Thought? >> >> Sure, "an application like pg_basebackup" sounds fine to me. > > OK, I changed the doc that way. Attached the updated version of the patch. Attached is the updated version of the patch. The previous patch used only pgstat_progress_update_param() even when updating multiple values. Since those updates are not atomic, this can cause readers of the values to see the intermediate states. To avoid this issue, the latest patch uses pgstat_progress_update_multi_param(), instead. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/02/26 23:18, Fujii Masao wrote: > > > On 2020/02/18 21:31, Fujii Masao wrote: >> >> >> On 2020/02/18 16:53, Amit Langote wrote: >>> On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> On 2020/02/18 16:02, Amit Langote wrote: >>>>> I noticed that there is missing </para> tag in the documentation changes: >>>> >>>> Could you tell me where I should add </para> tag? >>>> >>>>> + <row> >>>>> + <entry><literal>waiting for checkpoint to finish</literal></entry> >>>>> + <entry> >>>>> + The WAL sender process is currently performing >>>>> + <function>pg_start_backup</function> to set up for >>>>> + taking a base backup, and waiting for backup start >>>>> + checkpoint to finish. >>>>> + </entry> >>>>> + <row> >>>>> >>>>> There should be a </row> between </entry> and <row> at the end of the >>>>> hunk shown above. >>>> >>>> Will fix. Thanks! >>> >>> Just to clarify, that's the missing </para> tag I am talking about above. >> >> OK, so I added </row> tag just after the above </entry>. >> >>>>> + <para> >>>>> + Whenever <application>pg_basebackup</application> is taking a base >>>>> + backup, the <structname>pg_stat_progress_basebackup</structname> >>>>> + view will contain a row for each WAL sender process that is currently >>>>> + running <command>BASE_BACKUP</command> replication command >>>>> + and streaming the backup. >>>>> >>>>> I understand that you wrote "Whenever pg_basebackup is taking a >>>>> backup...", because description of other views contains a similar >>>>> starting line. But, it may not only be pg_basebackup that would be >>>>> served by this view, no? It could be any tool that speaks Postgres' >>>>> replication protocol and thus be able to send a BASE_BACKUP command. >>>>> If that is correct, I would write something like "When an application >>>>> is taking a backup" or some such without specific reference to >>>>> pg_basebackup. Thoughts? >>>> >>>> Yeah, there may be some such applications. But most users would >>>> use pg_basebackup, so getting rid of the reference to pg_basebackup >>>> would make the description a bit difficult-to-read. Also I can imagine >>>> that an user of those backup applications would get to know >>>> the progress reporting view from their documents. So I prefer >>>> the existing one or something like "Whenever an application like >>>> pg_basebackup ...". Thought? >>> >>> Sure, "an application like pg_basebackup" sounds fine to me. >> >> OK, I changed the doc that way. Attached the updated version of the patch. > > Attached is the updated version of the patch. > > The previous patch used only pgstat_progress_update_param() > even when updating multiple values. Since those updates are > not atomic, this can cause readers of the values to see > the intermediate states. To avoid this issue, the latest patch > uses pgstat_progress_update_multi_param(), instead. Attached the updated version of the patch. Barring any objections, I plan to commit this patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
From
Sergei Kornilov
Date:
Hello I reviewed a recently published patch. Looks good for me. One small note: the values for the new definitions in progress.h seems not to be aligned vertically. However, pgindentdoesn't objects. regards, Sergei
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > Attached is the updated version of the patch. > > The previous patch used only pgstat_progress_update_param() > > even when updating multiple values. Since those updates are > > not atomic, this can cause readers of the values to see > > the intermediate states. To avoid this issue, the latest patch > > uses pgstat_progress_update_multi_param(), instead. > > Attached the updated version of the patch. > Barring any objections, I plan to commit this patch. It is working as designed and the status names are fine to me. The last one comment from me. The newly defined symbols have inconsistent indents. === #define PROGRESS_BASEBACKUP_PHASE 0 #define PROGRESS_BASEBACKUP_BACKUP_TOTAL 1 #define PROGRESS_BASEBACKUP_BACKUP_STREAMED 2 #define PROGRESS_BASEBACKUP_TBLSPC_TOTAL 3 #define PROGRESS_BASEBACKUP_TBLSPC_STREAMED 4 /* Phases of pg_basebackup (as advertised via PROGRESS_BASEBACKUP_PHASE) */ #define PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT 1 #define PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE 2 #define PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP 3 #define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4 #define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5 ==== regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/02 19:27, Sergei Kornilov wrote: > Hello > > I reviewed a recently published patch. Looks good for me. Thanks for the review! I pushed the patch. > One small note: the values for the new definitions in progress.h seems not to be aligned vertically. However, pgindentdoesn't objects. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/03 9:27, Kyotaro Horiguchi wrote: > At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>> Attached is the updated version of the patch. >>> The previous patch used only pgstat_progress_update_param() >>> even when updating multiple values. Since those updates are >>> not atomic, this can cause readers of the values to see >>> the intermediate states. To avoid this issue, the latest patch >>> uses pgstat_progress_update_multi_param(), instead. >> >> Attached the updated version of the patch. >> Barring any objections, I plan to commit this patch. > > It is working as designed and the status names are fine to me. Thanks for the review! I pushed the patch. > The last one comment from me. > The newly defined symbols have inconsistent indents. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
RE: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Hi, Thank you for developing good features. The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamedcolumn. Regards, Noriyoshi Shinoda -----Original Message----- From: Fujii Masao [mailto:masao.fujii@oss.nttdata.com] Sent: Tuesday, March 3, 2020 12:09 PM To: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Cc: amitlangote09@gmail.com; masahiko.sawada@2ndquadrant.com; pgsql-hackers@postgresql.org Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side On 2020/03/03 9:27, Kyotaro Horiguchi wrote: > At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao > <masao.fujii@oss.nttdata.com> wrote in >>> Attached is the updated version of the patch. >>> The previous patch used only pgstat_progress_update_param() even >>> when updating multiple values. Since those updates are not atomic, >>> this can cause readers of the values to see the intermediate states. >>> To avoid this issue, the latest patch uses >>> pgstat_progress_update_multi_param(), instead. >> >> Attached the updated version of the patch. >> Barring any objections, I plan to commit this patch. > > It is working as designed and the status names are fine to me. Thanks for the review! I pushed the patch. > The last one comment from me. > The newly defined symbols have inconsistent indents. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: > Hi, > > Thank you for developing good features. > The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamedcolumn. Thanks for the report and patch! Pushed. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: > > Hi, > > > > Thank you for developing good features. > > The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamedcolumn. > > Thanks for the report and patch! Pushed. This patch requires, AIUI, that you add -P to the pg_basebackup commandline in order to get the progress tracked in details serverside. But this also generates output in the client that one might not want. Should we perhaps have a switch in pg_basebackup that enables the server side tracking only, without generating output in the client? //Magnus
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/05 9:31, Magnus Hagander wrote: > On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: >>> Hi, >>> >>> Thank you for developing good features. >>> The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamedcolumn. >> >> Thanks for the report and patch! Pushed. > > This patch requires, AIUI, that you add -P to the pg_basebackup > commandline in order to get the progress tracked in details > serverside. Whether --progress is enabled or not, the pg_stat_progress_basebackup view report the progress of the backup in the server side. But the total amount of data that will be streamed is estimated and reported only when this option is enabled. > But this also generates output in the client that one > might not want. > > Should we perhaps have a switch in pg_basebackup that enables the > server side tracking only, without generating output in the client? Yes, this sounds reasonable. I have two ideas. (1) Extend --progress option so that it accepts the setting values like none, server, both (better names?). If both is specified, PROGRESS option is specified in BASE_BACKUP replication command and the total backup size is estimated in the server side, but the progress is not reported in a client side. If none, PROGRESS option is not specified in BASE_BACKUP. The demerit of this idea is that --progress option without argument is not supported yet and the existing application using --progress option when using pg_basebackup needs to be updated when upgrading PostgreSQL version to v13. (2) Add something like --estimate-backup-size (better name?) option to pg_basebackup. If it's specified, PROGRESS option is specified but the progress is not reported in a client side. Thought? Or, as another approach, it might be worth considering to make the server always estimate the total backup size whether --progress is specified or not, as Amit argued upthread. If the time required to estimate the backup size is negligible compared to total backup time, IMO this approach seems better. If we adopt this, we can also get rid of PROGESS option from BASE_BACKUP replication command. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Peter Eisentraut
Date:
On 2020-03-05 05:53, Fujii Masao wrote: > Or, as another approach, it might be worth considering to make > the server always estimate the total backup size whether --progress is > specified or not, as Amit argued upthread. If the time required to > estimate the backup size is negligible compared to total backup time, > IMO this approach seems better. If we adopt this, we can also get > rid of PROGESS option from BASE_BACKUP replication command. I think that would be preferable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Julien Rouhaud
Date:
On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-03-05 05:53, Fujii Masao wrote: > > Or, as another approach, it might be worth considering to make > > the server always estimate the total backup size whether --progress is > > specified or not, as Amit argued upthread. If the time required to > > estimate the backup size is negligible compared to total backup time, > > IMO this approach seems better. If we adopt this, we can also get > > rid of PROGESS option from BASE_BACKUP replication command. > > I think that would be preferable. +1
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 5 Mar 2020 10:32:45 +0100 Julien Rouhaud <rjuju123@gmail.com> wrote: > On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2020-03-05 05:53, Fujii Masao wrote: > > > Or, as another approach, it might be worth considering to make > > > the server always estimate the total backup size whether --progress is > > > specified or not, as Amit argued upthread. If the time required to > > > estimate the backup size is negligible compared to total backup time, > > > IMO this approach seems better. If we adopt this, we can also get > > > rid of PROGESS option from BASE_BACKUP replication command. > > > > I think that would be preferable. > > +1 +1
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Thu, 5 Mar 2020 10:32:45 +0100, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Thu, Mar 5, 2020 at 8:15 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2020-03-05 05:53, Fujii Masao wrote: > > > Or, as another approach, it might be worth considering to make > > > the server always estimate the total backup size whether --progress is > > > specified or not, as Amit argued upthread. If the time required to > > > estimate the backup size is negligible compared to total backup time, > > > IMO this approach seems better. If we adopt this, we can also get > > > rid of PROGESS option from BASE_BACKUP replication command. > > > > I think that would be preferable. > > +1 +1 -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-03-05 05:53, Fujii Masao wrote: > > Or, as another approach, it might be worth considering to make > > the server always estimate the total backup size whether --progress is > > specified or not, as Amit argued upthread. If the time required to > > estimate the backup size is negligible compared to total backup time, > > IMO this approach seems better. If we adopt this, we can also get > > rid of PROGESS option from BASE_BACKUP replication command. > > I think that would be preferable. From a UI perspective I definitely agree. The problem with that one is that it can take a non-trivlal amount of time, that's why it was made an option (in the protocol) in the first place. Particularly if you have a database with many small objets. Is it enough to care about? I'm not sure, but it's definitely something to consider. It was not negligible in some tests I ran then, but it is quite some time ago and reality has definitely changed since then. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/06 0:45, Magnus Hagander wrote: > On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> >> On 2020-03-05 05:53, Fujii Masao wrote: >>> Or, as another approach, it might be worth considering to make >>> the server always estimate the total backup size whether --progress is >>> specified or not, as Amit argued upthread. If the time required to >>> estimate the backup size is negligible compared to total backup time, >>> IMO this approach seems better. If we adopt this, we can also get >>> rid of PROGESS option from BASE_BACKUP replication command. >> >> I think that would be preferable. > > From a UI perspective I definitely agree. > > The problem with that one is that it can take a non-trivlal amount of > time, that's why it was made an option (in the protocol) in the first > place. Particularly if you have a database with many small objets. Yeah, this is why I made the server estimate the total backup size only when --progress is specified. Another idea is; - Make pg_basebackup specify PROGRESS option in BASE_BACKUP command whether --progress is specified or not. This causes the server to estimate the total backup size even when users don't specify --progress. - Change pg_basebackup so that it treats --progress option as just a knob to determine whether to report the progress in a client-side. - Add new option like --no-estimate-backup-size (better name?) to pg_basebackup. If this option is specified, pg_basebackup doesn't use PROGRESS in BASE_BACKUP and the server doesn't estimate the backup size. I believe that the time required to estimate the backup size is not so large in most cases, so in the above idea, most users don't need to specify more option for the estimation. This is good for UI perspective. OTOH, users who are worried about the estimation time can use --no-estimate-backup-size option and skip the time-consuming estimation. Thought? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/06 0:45, Magnus Hagander wrote: > > On Wed, Mar 4, 2020 at 11:15 PM Peter Eisentraut > > <peter.eisentraut@2ndquadrant.com> wrote: > >> > >> On 2020-03-05 05:53, Fujii Masao wrote: > >>> Or, as another approach, it might be worth considering to make > >>> the server always estimate the total backup size whether --progress is > >>> specified or not, as Amit argued upthread. If the time required to > >>> estimate the backup size is negligible compared to total backup time, > >>> IMO this approach seems better. If we adopt this, we can also get > >>> rid of PROGESS option from BASE_BACKUP replication command. > >> > >> I think that would be preferable. > > > > From a UI perspective I definitely agree. > > > > The problem with that one is that it can take a non-trivlal amount of > > time, that's why it was made an option (in the protocol) in the first > > place. Particularly if you have a database with many small objets. > > Yeah, this is why I made the server estimate the total backup size > only when --progress is specified. > > Another idea is; > - Make pg_basebackup specify PROGRESS option in BASE_BACKUP command > whether --progress is specified or not. This causes the server to estimate > the total backup size even when users don't specify --progress. > - Change pg_basebackup so that it treats --progress option as just a knob to > determine whether to report the progress in a client-side. > - Add new option like --no-estimate-backup-size (better name?) to > pg_basebackup. If this option is specified, pg_basebackup doesn't use > PROGRESS in BASE_BACKUP and the server doesn't estimate the backup size. > > I believe that the time required to estimate the backup size is not so large > in most cases, so in the above idea, most users don't need to specify more > option for the estimation. This is good for UI perspective. > > OTOH, users who are worried about the estimation time can use > --no-estimate-backup-size option and skip the time-consuming estimation. Personally, I think this is the best idea. it brings a "reasonable default", since most people are not going to have this problem, and yet a good way to get out from the issue for those that potentially have it. Especially since we are now already showing the state that "walsender is estimating the size", it should be easy enugh for people to determine if they need to use this flag or not. In nitpicking mode, I'd just call the flag --no-estimate-size -- it's pretty clear things are about backups when you call pg_basebackup, and it keeps the option a bit more reasonable in length. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Kyotaro Horiguchi
Date:
At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > I believe that the time required to estimate the backup size is not so large > > in most cases, so in the above idea, most users don't need to specify more > > option for the estimation. This is good for UI perspective. > > > > OTOH, users who are worried about the estimation time can use > > --no-estimate-backup-size option and skip the time-consuming estimation. > > Personally, I think this is the best idea. it brings a "reasonable > default", since most people are not going to have this problem, and > yet a good way to get out from the issue for those that potentially > have it. Especially since we are now already showing the state that > "walsender is estimating the size", it should be easy enugh for people > to determine if they need to use this flag or not. > > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's > pretty clear things are about backups when you call pg_basebackup, and > it keeps the option a bit more reasonable in length. I agree to the negative option and the shortened name. What if both --no-estimate-size and -P are specifed? Rejecting as conflicting options or -P supercedes? I would choose the former because we don't know which of them has priority. $ pg_basebackup --no-estimate-size -P pg_basebackup: -P requires size estimate. $ regads. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in > > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > I believe that the time required to estimate the backup size is not so large > > > in most cases, so in the above idea, most users don't need to specify more > > > option for the estimation. This is good for UI perspective. > > > > > > OTOH, users who are worried about the estimation time can use > > > --no-estimate-backup-size option and skip the time-consuming estimation. > > > > Personally, I think this is the best idea. it brings a "reasonable > > default", since most people are not going to have this problem, and > > yet a good way to get out from the issue for those that potentially > > have it. Especially since we are now already showing the state that > > "walsender is estimating the size", it should be easy enugh for people > > to determine if they need to use this flag or not. > > > > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's > > pretty clear things are about backups when you call pg_basebackup, and > > it keeps the option a bit more reasonable in length. > > I agree to the negative option and the shortened name. What if both > --no-estimate-size and -P are specifed? Rejecting as conflicting > options or -P supercedes? I would choose the former because we don't > know which of them has priority. I would definitely prefer rejecting an invalid combination of options. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/09 14:21, Magnus Hagander wrote: > On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >> >> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in >>> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> I believe that the time required to estimate the backup size is not so large >>>> in most cases, so in the above idea, most users don't need to specify more >>>> option for the estimation. This is good for UI perspective. >>>> >>>> OTOH, users who are worried about the estimation time can use >>>> --no-estimate-backup-size option and skip the time-consuming estimation. >>> >>> Personally, I think this is the best idea. it brings a "reasonable >>> default", since most people are not going to have this problem, and >>> yet a good way to get out from the issue for those that potentially >>> have it. Especially since we are now already showing the state that >>> "walsender is estimating the size", it should be easy enugh for people >>> to determine if they need to use this flag or not. >>> >>> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's >>> pretty clear things are about backups when you call pg_basebackup, and >>> it keeps the option a bit more reasonable in length. +1 >> I agree to the negative option and the shortened name. What if both >> --no-estimate-size and -P are specifed? Rejecting as conflicting >> options or -P supercedes? I would choose the former because we don't >> know which of them has priority. > > I would definitely prefer rejecting an invalid combination of options. +1 So, I will make the patch adding support for --no-estimate-size option in pg_basebackup. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/10 11:36, Fujii Masao wrote: > > > On 2020/03/09 14:21, Magnus Hagander wrote: >> On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi >> <horikyota.ntt@gmail.com> wrote: >>> >>> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander <magnus@hagander.net> wrote in >>>> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> I believe that the time required to estimate the backup size is not so large >>>>> in most cases, so in the above idea, most users don't need to specify more >>>>> option for the estimation. This is good for UI perspective. >>>>> >>>>> OTOH, users who are worried about the estimation time can use >>>>> --no-estimate-backup-size option and skip the time-consuming estimation. >>>> >>>> Personally, I think this is the best idea. it brings a "reasonable >>>> default", since most people are not going to have this problem, and >>>> yet a good way to get out from the issue for those that potentially >>>> have it. Especially since we are now already showing the state that >>>> "walsender is estimating the size", it should be easy enugh for people >>>> to determine if they need to use this flag or not. >>>> >>>> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's >>>> pretty clear things are about backups when you call pg_basebackup, and >>>> it keeps the option a bit more reasonable in length. > > +1 > >>> I agree to the negative option and the shortened name. What if both >>> --no-estimate-size and -P are specifed? Rejecting as conflicting >>> options or -P supercedes? I would choose the former because we don't >>> know which of them has priority. >> >> I would definitely prefer rejecting an invalid combination of options. > > +1 > > So, I will make the patch adding support for --no-estimate-size option > in pg_basebackup. Patch attached. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > So, I will make the patch adding support for --no-estimate-size option > > in pg_basebackup. > > Patch attached. Like the idea and the patch looks mostly good. + total size. If the estimation is disabled in + <application>pg_basebackup</application> + (i.e., <literal>--no-estimate-size</literal> option is specified), + this is always <literal>0</literal>. "always" seems unnecessary. + This option prevents the server from estimating the total + amount of backup data that will be streamed. In other words, + <literal>backup_total</literal> column in the + <structname>pg_stat_progress_basebackup</structname> + view always indicates <literal>0</literal> if this option is enabled. Here too. Thanks, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/10 22:43, Amit Langote wrote: > On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> So, I will make the patch adding support for --no-estimate-size option >>> in pg_basebackup. >> >> Patch attached. > > Like the idea and the patch looks mostly good. Thanks for reviewing the patch! > + total size. If the estimation is disabled in > + <application>pg_basebackup</application> > + (i.e., <literal>--no-estimate-size</literal> option is specified), > + this is always <literal>0</literal>. > > "always" seems unnecessary. Fixed. > + This option prevents the server from estimating the total > + amount of backup data that will be streamed. In other words, > + <literal>backup_total</literal> column in the > + <structname>pg_stat_progress_basebackup</structname> > + view always indicates <literal>0</literal> if this option is enabled. > > Here too. Fixed. Attached is the updated version of the patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/10 22:43, Amit Langote wrote: > > On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>> So, I will make the patch adding support for --no-estimate-size option > >>> in pg_basebackup. > >> > >> Patch attached. > > > > Like the idea and the patch looks mostly good. > > Thanks for reviewing the patch! > > > + total size. If the estimation is disabled in > > + <application>pg_basebackup</application> > > + (i.e., <literal>--no-estimate-size</literal> option is specified), > > + this is always <literal>0</literal>. > > > > "always" seems unnecessary. > > Fixed. > > > + This option prevents the server from estimating the total > > + amount of backup data that will be streamed. In other words, > > + <literal>backup_total</literal> column in the > > + <structname>pg_stat_progress_basebackup</structname> > > + view always indicates <literal>0</literal> if this option is enabled. > > > > Here too. > > Fixed. > > Attached is the updated version of the patch. Would it perhaps be better to return NULL instead of 0 in the statistics view if there is no data? Also, should it really be the server version that decides how this feature behaves, and not the pg_basebackup version? Given that the implementation is entirely in the client, it seems that's more logical? and a few docs nitpicks: <para> Whether this is enabled or not, the <structname>pg_stat_progress_basebackup</structname> view - report the progress of the backup in the server side. But note - that the total amount of data that will be streamed is estimated - and reported only when this option is enabled. In other words, - <literal>backup_total</literal> column in the view always - indicates <literal>0</literal> if this option is disabled. + report the progress of the backup in the server side. + </para> + <para> + This option is not allowed when using + <option>--no-estimate-size</option>. </para> I think you should just remove that whole paragraph. The details are now listed under the disable parameter. + This option prevents the server from estimating the total + amount of backup data that will be streamed. In other words, + <literal>backup_total</literal> column in the + <structname>pg_stat_progress_basebackup</structname> + view indicates <literal>0</literal> if this option is enabled. I suggest just "This option prevents the server from estimating the total amount of backup data that will be streamed, resulting in the ackup_total column in pg_stat_progress_basebackup to be (zero or NULL depending on above)". (Markup needed on that of course ,but you get the idea) + When this is disabled, the backup will start by enumerating I'd try to avoid the double negation, with something "without this parameter, the backup will start..." + <para> + <application>pg_basebackup</application> asks the server to estimate + the total amount of data that will be streamed by default (unless + <option>--no-estimate-size</option> is specified) in version 13 or later, + and does that only when <option>--progress</option> is specified in + the older versions. + </para> That's an item for the release notes, not for the reference page, I think. It's already explained under the --disable parameter, so I suggest removing this paragraph as well. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Attached is the updated version of the patch. > > Would it perhaps be better to return NULL instead of 0 in the > statistics view if there is no data? NULL sounds better than 0. Thank you, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/11 13:44, Amit Langote wrote: > On Wed, Mar 11, 2020 at 3:39 AM Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> Attached is the updated version of the patch. >> >> Would it perhaps be better to return NULL instead of 0 in the >> statistics view if there is no data? > > NULL sounds better than 0. Ok, I will make the patch that changes the view so that NULL is returned instead of 0. I'm thinking to commit that patch after applying the change of pg_basebackup client side first. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/11 3:39, Magnus Hagander wrote: > On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/03/10 22:43, Amit Langote wrote: >>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> So, I will make the patch adding support for --no-estimate-size option >>>>> in pg_basebackup. >>>> >>>> Patch attached. >>> >>> Like the idea and the patch looks mostly good. >> >> Thanks for reviewing the patch! >> >>> + total size. If the estimation is disabled in >>> + <application>pg_basebackup</application> >>> + (i.e., <literal>--no-estimate-size</literal> option is specified), >>> + this is always <literal>0</literal>. >>> >>> "always" seems unnecessary. >> >> Fixed. >> >>> + This option prevents the server from estimating the total >>> + amount of backup data that will be streamed. In other words, >>> + <literal>backup_total</literal> column in the >>> + <structname>pg_stat_progress_basebackup</structname> >>> + view always indicates <literal>0</literal> if this option is enabled. >>> >>> Here too. >> >> Fixed. >> >> Attached is the updated version of the patch. > > Would it perhaps be better to return NULL instead of 0 in the > statistics view if there is no data? > > Also, should it really be the server version that decides how this > feature behaves, and not the pg_basebackup version? Given that the > implementation is entirely in the client, it seems that's more > logical? Yeah, you're right. I changed the patch that way. Attached is the updated version of the patch. > and a few docs nitpicks: > > <para> > Whether this is enabled or not, the > <structname>pg_stat_progress_basebackup</structname> view > - report the progress of the backup in the server side. But note > - that the total amount of data that will be streamed is estimated > - and reported only when this option is enabled. In other words, > - <literal>backup_total</literal> column in the view always > - indicates <literal>0</literal> if this option is disabled. > + report the progress of the backup in the server side. > + </para> > + <para> > + This option is not allowed when using > + <option>--no-estimate-size</option>. > </para> > > I think you should just remove that whole paragraph. The details are > now listed under the disable parameter. Fixed. > + This option prevents the server from estimating the total > + amount of backup data that will be streamed. In other words, > + <literal>backup_total</literal> column in the > + <structname>pg_stat_progress_basebackup</structname> > + view indicates <literal>0</literal> if this option is enabled. > > I suggest just "This option prevents the server from estimating the > total amount of backup data that will be streamed, resulting in the > ackup_total column in pg_stat_progress_basebackup to be (zero or NULL > depending on above)". > > (Markup needed on that of course ,but you get the idea) Yes, fixed. > + When this is disabled, the backup will start by enumerating > > I'd try to avoid the double negation, with something "without this > parameter, the backup will start..." Fixed. I used "Without using this option ...". > + <para> > + <application>pg_basebackup</application> asks the server to estimate > + the total amount of data that will be streamed by default (unless > + <option>--no-estimate-size</option> is specified) in version 13 or later, > + and does that only when <option>--progress</option> is specified in > + the older versions. > + </para> > > That's an item for the release notes, not for the reference page, I > think. It's already explained under the --disable parameter, so I > suggest removing this paragraph as well. Fixed. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/11 3:39, Magnus Hagander wrote: > > On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/03/10 22:43, Amit Langote wrote: > >>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>> So, I will make the patch adding support for --no-estimate-size option > >>>>> in pg_basebackup. > >>>> > >>>> Patch attached. > >>> > >>> Like the idea and the patch looks mostly good. > >> > >> Thanks for reviewing the patch! > >> > >>> + total size. If the estimation is disabled in > >>> + <application>pg_basebackup</application> > >>> + (i.e., <literal>--no-estimate-size</literal> option is specified), > >>> + this is always <literal>0</literal>. > >>> > >>> "always" seems unnecessary. > >> > >> Fixed. > >> > >>> + This option prevents the server from estimating the total > >>> + amount of backup data that will be streamed. In other words, > >>> + <literal>backup_total</literal> column in the > >>> + <structname>pg_stat_progress_basebackup</structname> > >>> + view always indicates <literal>0</literal> if this option is enabled. > >>> > >>> Here too. > >> > >> Fixed. > >> > >> Attached is the updated version of the patch. > > > > Would it perhaps be better to return NULL instead of 0 in the > > statistics view if there is no data? Did you miss this comment, or not agree? :) > > Also, should it really be the server version that decides how this > > feature behaves, and not the pg_basebackup version? Given that the > > implementation is entirely in the client, it seems that's more > > logical? > > Yeah, you're right. I changed the patch that way. > Attached is the updated version of the patch. The other changes in it look good! //Magnus
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 0:37, Magnus Hagander wrote: > On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/03/11 3:39, Magnus Hagander wrote: >>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2020/03/10 22:43, Amit Langote wrote: >>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>> So, I will make the patch adding support for --no-estimate-size option >>>>>>> in pg_basebackup. >>>>>> >>>>>> Patch attached. >>>>> >>>>> Like the idea and the patch looks mostly good. >>>> >>>> Thanks for reviewing the patch! >>>> >>>>> + total size. If the estimation is disabled in >>>>> + <application>pg_basebackup</application> >>>>> + (i.e., <literal>--no-estimate-size</literal> option is specified), >>>>> + this is always <literal>0</literal>. >>>>> >>>>> "always" seems unnecessary. >>>> >>>> Fixed. >>>> >>>>> + This option prevents the server from estimating the total >>>>> + amount of backup data that will be streamed. In other words, >>>>> + <literal>backup_total</literal> column in the >>>>> + <structname>pg_stat_progress_basebackup</structname> >>>>> + view always indicates <literal>0</literal> if this option is enabled. >>>>> >>>>> Here too. >>>> >>>> Fixed. >>>> >>>> Attached is the updated version of the patch. >>> >>> Would it perhaps be better to return NULL instead of 0 in the >>> statistics view if there is no data? > > Did you miss this comment, or not agree? :) Oh, I forgot to attached the patch... Patch attached. This patch needs to be applied after applying add_no_estimate_size_v3.patch. >>> Also, should it really be the server version that decides how this >>> feature behaves, and not the pg_basebackup version? Given that the >>> implementation is entirely in the client, it seems that's more >>> logical? >> >> Yeah, you're right. I changed the patch that way. >> Attached is the updated version of the patch. > > The other changes in it look good! Thanks for the review! Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Attachment
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Magnus Hagander
Date:
On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/03/19 0:37, Magnus Hagander wrote: > > On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/03/11 3:39, Magnus Hagander wrote: > >>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/03/10 22:43, Amit Langote wrote: > >>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>>>>> So, I will make the patch adding support for --no-estimate-size option > >>>>>>> in pg_basebackup. > >>>>>> > >>>>>> Patch attached. > >>>>> > >>>>> Like the idea and the patch looks mostly good. > >>>> > >>>> Thanks for reviewing the patch! > >>>> > >>>>> + total size. If the estimation is disabled in > >>>>> + <application>pg_basebackup</application> > >>>>> + (i.e., <literal>--no-estimate-size</literal> option is specified), > >>>>> + this is always <literal>0</literal>. > >>>>> > >>>>> "always" seems unnecessary. > >>>> > >>>> Fixed. > >>>> > >>>>> + This option prevents the server from estimating the total > >>>>> + amount of backup data that will be streamed. In other words, > >>>>> + <literal>backup_total</literal> column in the > >>>>> + <structname>pg_stat_progress_basebackup</structname> > >>>>> + view always indicates <literal>0</literal> if this option is enabled. > >>>>> > >>>>> Here too. > >>>> > >>>> Fixed. > >>>> > >>>> Attached is the updated version of the patch. > >>> > >>> Would it perhaps be better to return NULL instead of 0 in the > >>> statistics view if there is no data? > > > > Did you miss this comment, or not agree? :) > > Oh, I forgot to attached the patch... Patch attached. > This patch needs to be applied after applying > add_no_estimate_size_v3.patch. :) Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL view. I wonder if it might be worth teaching pg_stat_get_progress_info() about returning NULL? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 1:22, Magnus Hagander wrote: > On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/03/19 0:37, Magnus Hagander wrote: >>> On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2020/03/11 3:39, Magnus Hagander wrote: >>>>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2020/03/10 22:43, Amit Langote wrote: >>>>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>>> So, I will make the patch adding support for --no-estimate-size option >>>>>>>>> in pg_basebackup. >>>>>>>> >>>>>>>> Patch attached. >>>>>>> >>>>>>> Like the idea and the patch looks mostly good. >>>>>> >>>>>> Thanks for reviewing the patch! >>>>>> >>>>>>> + total size. If the estimation is disabled in >>>>>>> + <application>pg_basebackup</application> >>>>>>> + (i.e., <literal>--no-estimate-size</literal> option is specified), >>>>>>> + this is always <literal>0</literal>. >>>>>>> >>>>>>> "always" seems unnecessary. >>>>>> >>>>>> Fixed. >>>>>> >>>>>>> + This option prevents the server from estimating the total >>>>>>> + amount of backup data that will be streamed. In other words, >>>>>>> + <literal>backup_total</literal> column in the >>>>>>> + <structname>pg_stat_progress_basebackup</structname> >>>>>>> + view always indicates <literal>0</literal> if this option is enabled. >>>>>>> >>>>>>> Here too. >>>>>> >>>>>> Fixed. >>>>>> >>>>>> Attached is the updated version of the patch. >>>>> >>>>> Would it perhaps be better to return NULL instead of 0 in the >>>>> statistics view if there is no data? >>> >>> Did you miss this comment, or not agree? :) >> >> Oh, I forgot to attached the patch... Patch attached. >> This patch needs to be applied after applying >> add_no_estimate_size_v3.patch. > > :) > > Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL > view. I wonder if it might be worth teaching > pg_stat_get_progress_info() about returning NULL? That's possible by - adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM] that indicates whether each column is NULL or not, into PgBackendStatus - extending pgstat_progress_update_param() and pgstat_progress_update_multi_param() so that they can update the boolean array for NULL - updating the progress reporting code so that the extended versions of function are used I didn't adopt this idea because it looks a bit overkill for the purpose. OTOH, this would be good improvement for the progress reporting infrastructure and I'm fine to implement it. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
Hello, On Thu, Mar 19, 2020 at 1:47 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/03/19 1:22, Magnus Hagander wrote: > >>>>> Would it perhaps be better to return NULL instead of 0 in the > >>>>> statistics view if there is no data? > >>> > >>> Did you miss this comment, or not agree? :) > >> > >> Oh, I forgot to attached the patch... Patch attached. > >> This patch needs to be applied after applying > >> add_no_estimate_size_v3.patch. > > > > :) > > > > Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL > > view. I wonder if it might be worth teaching > > pg_stat_get_progress_info() about returning NULL? > > That's possible by > - adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM] > that indicates whether each column is NULL or not, into PgBackendStatus > - extending pgstat_progress_update_param() and pgstat_progress_update_multi_param() > so that they can update the boolean array for NULL > - updating the progress reporting code so that the extended versions of > function are used > > I didn't adopt this idea because it looks a bit overkill for the purpose. I tend to agree that this would be too many changes for something only one place currently needs to use. > OTOH, this would be good improvement for the progress reporting > infrastructure and I'm fine to implement it. Magnus' idea of checking the values in pg_stat_get_progress_info() to determine whether to return NULL seems fine to me. We will need to update the documentation of st_progress_param, because it currently says: * ...but the meaning of each element in the * st_progress_param array is command-specific. */ ProgressCommandType st_progress_command; Oid st_progress_command_target; int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; } PgBackendStatus; If we are to define -1 in st_progress_param[] as NULL to the users, that must be mentioned here. -- Thank you, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Alvaro Herrera
Date:
On 2020-Mar-19, Amit Langote wrote: > Magnus' idea of checking the values in pg_stat_get_progress_info() to > determine whether to return NULL seems fine to me. We will need to > update the documentation of st_progress_param, because it currently > says: > > * ...but the meaning of each element in the > * st_progress_param array is command-specific. > */ > ProgressCommandType st_progress_command; > Oid st_progress_command_target; > int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; > } PgBackendStatus; > > If we are to define -1 in st_progress_param[] as NULL to the users, > that must be mentioned here. Hmm, why -1? It seems like a value that we might want to use for other purposes in other params. Maybe INT64_MIN is a better choice? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2020-Mar-19, Amit Langote wrote: > > > Magnus' idea of checking the values in pg_stat_get_progress_info() to > > determine whether to return NULL seems fine to me. We will need to > > update the documentation of st_progress_param, because it currently > > says: > > > > * ...but the meaning of each element in the > > * st_progress_param array is command-specific. > > */ > > ProgressCommandType st_progress_command; > > Oid st_progress_command_target; > > int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; > > } PgBackendStatus; > > > > If we are to define -1 in st_progress_param[] as NULL to the users, > > that must be mentioned here. > > Hmm, why -1? It seems like a value that we might want to use for other > purposes in other params. Maybe INT64_MIN is a better choice? Yes, maybe. -- Thank you, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Alvaro Herrera
Date:
On 2020-Mar-19, Amit Langote wrote: > On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > On 2020-Mar-19, Amit Langote wrote: > > > > > Magnus' idea of checking the values in pg_stat_get_progress_info() to > > > determine whether to return NULL seems fine to me. We will need to > > > update the documentation of st_progress_param, because it currently > > > says: > > > > > > * ...but the meaning of each element in the > > > * st_progress_param array is command-specific. > > > */ > > > ProgressCommandType st_progress_command; > > > Oid st_progress_command_target; > > > int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; > > > } PgBackendStatus; > > > > > > If we are to define -1 in st_progress_param[] as NULL to the users, > > > that must be mentioned here. > > > > Hmm, why -1? It seems like a value that we might want to use for other > > purposes in other params. Maybe INT64_MIN is a better choice? > > Yes, maybe. Looking at the code involved, I think it's okay to use -1 in that specific param and teach the SQL query to display null when it finds that value. We have plenty of magic numbers in the progress params, and it's always the definition of the view that interprets them. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 11:32, Amit Langote wrote: > On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> On 2020-Mar-19, Amit Langote wrote: >> >>> Magnus' idea of checking the values in pg_stat_get_progress_info() to >>> determine whether to return NULL seems fine to me. So you think that the latest patch is good enough? >>> We will need to >>> update the documentation of st_progress_param, because it currently >>> says: >>> >>> * ...but the meaning of each element in the >>> * st_progress_param array is command-specific. >>> */ >>> ProgressCommandType st_progress_command; >>> Oid st_progress_command_target; >>> int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; >>> } PgBackendStatus; >>> >>> If we are to define -1 in st_progress_param[] as NULL to the users, >>> that must be mentioned here. >> >> Hmm, why -1? It seems like a value that we might want to use for other >> purposes in other params. Maybe INT64_MIN is a better choice? > > Yes, maybe. I don't think that we need to define the specific value like -1 as NULL globally. Which value should be used for that purpose may vary by each command. Only for pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for NULL is not so bad idea. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Amit Langote
Date:
On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2020/03/19 11:32, Amit Langote wrote: > > On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera > > <alvherre@2ndquadrant.com> wrote: > >> On 2020-Mar-19, Amit Langote wrote: > >> > >>> Magnus' idea of checking the values in pg_stat_get_progress_info() to > >>> determine whether to return NULL seems fine to me. > > So you think that the latest patch is good enough? I see that the latest patch modifies pg_stat_progress_basebackup view to return NULL, so not exactly. IIUC, Magnus seems to be advocating to *centralize* this in pg_stat_get_progress_info(), which all views are based on, which means we need to globally define a NULL param value, as Alvaro also pointed out. But... > >>> We will need to > >>> update the documentation of st_progress_param, because it currently > >>> says: > >>> > >>> * ...but the meaning of each element in the > >>> * st_progress_param array is command-specific. > >>> */ > >>> ProgressCommandType st_progress_command; > >>> Oid st_progress_command_target; > >>> int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; > >>> } PgBackendStatus; > >>> > >>> If we are to define -1 in st_progress_param[] as NULL to the users, > >>> that must be mentioned here. > >> > >> Hmm, why -1? It seems like a value that we might want to use for other > >> purposes in other params. Maybe INT64_MIN is a better choice? > > > > Yes, maybe. > > I don't think that we need to define the specific value like -1 as NULL globally. > Which value should be used for that purpose may vary by each command. Only for > pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for > NULL is not so bad idea. This is the first instance of needing to display NULL in a progress view, so a non-general solution may be enough for now. IOW, your latest patch is good enough for that. :) -- Thank you, Amit
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 1:13, Fujii Masao wrote: > > > On 2020/03/19 0:37, Magnus Hagander wrote: >> On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> >>> >>> On 2020/03/11 3:39, Magnus Hagander wrote: >>>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2020/03/10 22:43, Amit Langote wrote: >>>>>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>>> So, I will make the patch adding support for --no-estimate-size option >>>>>>>> in pg_basebackup. >>>>>>> >>>>>>> Patch attached. >>>>>> >>>>>> Like the idea and the patch looks mostly good. >>>>> >>>>> Thanks for reviewing the patch! >>>>> >>>>>> + total size. If the estimation is disabled in >>>>>> + <application>pg_basebackup</application> >>>>>> + (i.e., <literal>--no-estimate-size</literal> option is specified), >>>>>> + this is always <literal>0</literal>. >>>>>> >>>>>> "always" seems unnecessary. >>>>> >>>>> Fixed. >>>>> >>>>>> + This option prevents the server from estimating the total >>>>>> + amount of backup data that will be streamed. In other words, >>>>>> + <literal>backup_total</literal> column in the >>>>>> + <structname>pg_stat_progress_basebackup</structname> >>>>>> + view always indicates <literal>0</literal> if this option is enabled. >>>>>> >>>>>> Here too. >>>>> >>>>> Fixed. >>>>> >>>>> Attached is the updated version of the patch. >>>> >>>> Would it perhaps be better to return NULL instead of 0 in the >>>> statistics view if there is no data? >> >> Did you miss this comment, or not agree? :) > > Oh, I forgot to attached the patch... Patch attached. > This patch needs to be applied after applying > add_no_estimate_size_v3.patch. > >>>> Also, should it really be the server version that decides how this >>>> feature behaves, and not the pg_basebackup version? Given that the >>>> implementation is entirely in the client, it seems that's more >>>> logical? >>> >>> Yeah, you're right. I changed the patch that way. >>> Attached is the updated version of the patch. >> >> The other changes in it look good! > > Thanks for the review! Pushed! Thanks! Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 12:02, Amit Langote wrote: > On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> On 2020/03/19 11:32, Amit Langote wrote: >>> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera >>> <alvherre@2ndquadrant.com> wrote: >>>> On 2020-Mar-19, Amit Langote wrote: >>>> >>>>> Magnus' idea of checking the values in pg_stat_get_progress_info() to >>>>> determine whether to return NULL seems fine to me. >> >> So you think that the latest patch is good enough? > > I see that the latest patch modifies pg_stat_progress_basebackup view > to return NULL, so not exactly. IIUC, Magnus seems to be advocating > to *centralize* this in pg_stat_get_progress_info(), which all views > are based on, which means we need to globally define a NULL param > value, as Alvaro also pointed out. > > But... > >>>>> We will need to >>>>> update the documentation of st_progress_param, because it currently >>>>> says: >>>>> >>>>> * ...but the meaning of each element in the >>>>> * st_progress_param array is command-specific. >>>>> */ >>>>> ProgressCommandType st_progress_command; >>>>> Oid st_progress_command_target; >>>>> int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; >>>>> } PgBackendStatus; >>>>> >>>>> If we are to define -1 in st_progress_param[] as NULL to the users, >>>>> that must be mentioned here. >>>> >>>> Hmm, why -1? It seems like a value that we might want to use for other >>>> purposes in other params. Maybe INT64_MIN is a better choice? >>> >>> Yes, maybe. >> >> I don't think that we need to define the specific value like -1 as NULL globally. >> Which value should be used for that purpose may vary by each command. Only for >> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for >> NULL is not so bad idea. > > This is the first instance of needing to display NULL in a progress > view, so a non-general solution may be enough for now. IOW, your > latest patch is good enough for that. :) Ok, so barring any objection, I will commit the latest patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Andres Freund
Date:
Hi, On 2020-03-19 17:21:38 +0900, Fujii Masao wrote: > Pushed! Thanks! FWIW, I'm a bit doubtful that incuring the overhead of this by default on everybody is a nice thing. On filesystems with high latency and with a lot of small relations the overhead of stating a lot of files can be almost as high as the actual base backup. Greetings, Andres Freund
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/20 3:39, Andres Freund wrote: > Hi, > > On 2020-03-19 17:21:38 +0900, Fujii Masao wrote: >> Pushed! Thanks! > > FWIW, I'm a bit doubtful that incuring the overhead of this by default > on everybody is a nice thing. On filesystems with high latency and with > a lot of small relations the overhead of stating a lot of files can be > almost as high as the actual base backup. Yeah, so if we receive lots of complaints like that during beta and RC phases, we should consider to change the default behavior. Also maybe I should measure how long the estimation takes on the env where, for example, ten thousand tables (i.e., files) exist, in order to whether the default behavior is really time-consuming or not? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
From
Fujii Masao
Date:
On 2020/03/19 17:22, Fujii Masao wrote: > > > On 2020/03/19 12:02, Amit Langote wrote: >> On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote: >>> On 2020/03/19 11:32, Amit Langote wrote: >>>> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera >>>> <alvherre@2ndquadrant.com> wrote: >>>>> On 2020-Mar-19, Amit Langote wrote: >>>>> >>>>>> Magnus' idea of checking the values in pg_stat_get_progress_info() to >>>>>> determine whether to return NULL seems fine to me. >>> >>> So you think that the latest patch is good enough? >> >> I see that the latest patch modifies pg_stat_progress_basebackup view >> to return NULL, so not exactly. IIUC, Magnus seems to be advocating >> to *centralize* this in pg_stat_get_progress_info(), which all views >> are based on, which means we need to globally define a NULL param >> value, as Alvaro also pointed out. >> >> But... >> >>>>>> We will need to >>>>>> update the documentation of st_progress_param, because it currently >>>>>> says: >>>>>> >>>>>> * ...but the meaning of each element in the >>>>>> * st_progress_param array is command-specific. >>>>>> */ >>>>>> ProgressCommandType st_progress_command; >>>>>> Oid st_progress_command_target; >>>>>> int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; >>>>>> } PgBackendStatus; >>>>>> >>>>>> If we are to define -1 in st_progress_param[] as NULL to the users, >>>>>> that must be mentioned here. >>>>> >>>>> Hmm, why -1? It seems like a value that we might want to use for other >>>>> purposes in other params. Maybe INT64_MIN is a better choice? >>>> >>>> Yes, maybe. >>> >>> I don't think that we need to define the specific value like -1 as NULL globally. >>> Which value should be used for that purpose may vary by each command. Only for >>> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for >>> NULL is not so bad idea. >> >> This is the first instance of needing to display NULL in a progress >> view, so a non-general solution may be enough for now. IOW, your >> latest patch is good enough for that. :) > > Ok, so barring any objection, I will commit the latest patch. Pushed! Thanks all! Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters