Thread: Re: [COMMITTERS] pgsql: New SQL functons pg_backup_in_progress() and pg_backup_start_tim

On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas <rhaas@postgresql.org> wrote:
>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>
>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>
> How well is the term "on-line exclusive backup" really settled with
> people? I wonder if we need to add a specific note to the docs saying
> that the function doesn't consider streaming base backups at all, and
> that one should refer to pg_stat_replication for info about those? Or
> really, should the function be pg_exclusive_backup_in_progress()
> perhaps?

Well, if we think that the term "exclusive backup" is not going to be
easily comprehensible, then sticking that into the function name isn't
going to help us much.  I think that's just wordiness for the sake of
being wordy.  I do agree that we could probably improve the clarity of
the documentation along the lines you suggest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas <rhaas@postgresql.org> wrote:
>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>
>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>
>> How well is the term "on-line exclusive backup" really settled with
>> people? I wonder if we need to add a specific note to the docs saying
>> that the function doesn't consider streaming base backups at all, and
>> that one should refer to pg_stat_replication for info about those? Or
>> really, should the function be pg_exclusive_backup_in_progress()
>> perhaps?
>
> Well, if we think that the term "exclusive backup" is not going to be
> easily comprehensible, then sticking that into the function name isn't
> going to help us much.  I think that's just wordiness for the sake of
> being wordy.  I do agree that we could probably improve the clarity of
> the documentation along the lines you suggest.

It would alert people to the existance of the term, and thus help
those who didn't actually read the documentation.

Which actually makes an argument for making that change *anyway*,
because right now the function is incorrectly named. A function named
pg_backup_in_progress() should answer the question "is a backup in
progress". And it doesn't answer that question.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


On 15 June 2012 15:54, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas <rhaas@postgresql.org> wrote:
>>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>>
>>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>>
>>> How well is the term "on-line exclusive backup" really settled with
>>> people? I wonder if we need to add a specific note to the docs saying
>>> that the function doesn't consider streaming base backups at all, and
>>> that one should refer to pg_stat_replication for info about those? Or
>>> really, should the function be pg_exclusive_backup_in_progress()
>>> perhaps?
>>
>> Well, if we think that the term "exclusive backup" is not going to be
>> easily comprehensible, then sticking that into the function name isn't
>> going to help us much.  I think that's just wordiness for the sake of
>> being wordy.  I do agree that we could probably improve the clarity of
>> the documentation along the lines you suggest.
>
> It would alert people to the existance of the term, and thus help
> those who didn't actually read the documentation.
>
> Which actually makes an argument for making that change *anyway*,
> because right now the function is incorrectly named. A function named
> pg_backup_in_progress() should answer the question "is a backup in
> progress". And it doesn't answer that question.

Maybe pg_is_in_backup_mode, which would match the naming convention of
pg_is_in_recovery, and would claim that a backup is actually underway.

--
Thom


On Fri, Jun 15, 2012 at 11:08 PM, Thom Brown <thom@linux.com> wrote:
> On 15 June 2012 15:54, Magnus Hagander <magnus@hagander.net> wrote:
>> On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas <rhaas@postgresql.org> wrote:
>>>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>>>
>>>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>>>
>>>> How well is the term "on-line exclusive backup" really settled with
>>>> people? I wonder if we need to add a specific note to the docs saying
>>>> that the function doesn't consider streaming base backups at all, and
>>>> that one should refer to pg_stat_replication for info about those? Or
>>>> really, should the function be pg_exclusive_backup_in_progress()
>>>> perhaps?
>>>
>>> Well, if we think that the term "exclusive backup" is not going to be
>>> easily comprehensible, then sticking that into the function name isn't
>>> going to help us much.  I think that's just wordiness for the sake of
>>> being wordy.  I do agree that we could probably improve the clarity of
>>> the documentation along the lines you suggest.
>>
>> It would alert people to the existance of the term, and thus help
>> those who didn't actually read the documentation.
>>
>> Which actually makes an argument for making that change *anyway*,
>> because right now the function is incorrectly named. A function named
>> pg_backup_in_progress() should answer the question "is a backup in
>> progress". And it doesn't answer that question.
>
> Maybe pg_is_in_backup_mode, which would match the naming convention of
> pg_is_in_recovery, and would claim that a backup is actually underway.

Wouldn't that make it even more wrong since it doesn't include backups
taken using streaming backups?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


On 15 June 2012 16:09, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Jun 15, 2012 at 11:08 PM, Thom Brown <thom@linux.com> wrote:
>> On 15 June 2012 15:54, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas <rhaas@postgresql.org> wrote:
>>>>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>>>>
>>>>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>>>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>>>>
>>>>> How well is the term "on-line exclusive backup" really settled with
>>>>> people? I wonder if we need to add a specific note to the docs saying
>>>>> that the function doesn't consider streaming base backups at all, and
>>>>> that one should refer to pg_stat_replication for info about those? Or
>>>>> really, should the function be pg_exclusive_backup_in_progress()
>>>>> perhaps?
>>>>
>>>> Well, if we think that the term "exclusive backup" is not going to be
>>>> easily comprehensible, then sticking that into the function name isn't
>>>> going to help us much.  I think that's just wordiness for the sake of
>>>> being wordy.  I do agree that we could probably improve the clarity of
>>>> the documentation along the lines you suggest.
>>>
>>> It would alert people to the existance of the term, and thus help
>>> those who didn't actually read the documentation.
>>>
>>> Which actually makes an argument for making that change *anyway*,
>>> because right now the function is incorrectly named. A function named
>>> pg_backup_in_progress() should answer the question "is a backup in
>>> progress". And it doesn't answer that question.
>>
>> Maybe pg_is_in_backup_mode, which would match the naming convention of
>> pg_is_in_recovery, and would claim that a backup is actually underway.
>
> Wouldn't that make it even more wrong since it doesn't include backups
> taken using streaming backups?

Sorry I mean "*wouldn't* claim that a backup is underway"

--
Thom


On 15.06.2012 17:54, Magnus Hagander wrote:
> On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander<magnus@hagander.net>  wrote:
>>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas<rhaas@postgresql.org>  wrote:
>>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>>
>>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>>
>>> How well is the term "on-line exclusive backup" really settled with
>>> people? I wonder if we need to add a specific note to the docs saying
>>> that the function doesn't consider streaming base backups at all, and
>>> that one should refer to pg_stat_replication for info about those? Or
>>> really, should the function be pg_exclusive_backup_in_progress()
>>> perhaps?
>>
>> Well, if we think that the term "exclusive backup" is not going to be
>> easily comprehensible, then sticking that into the function name isn't
>> going to help us much.  I think that's just wordiness for the sake of
>> being wordy.  I do agree that we could probably improve the clarity of
>> the documentation along the lines you suggest.
>
> It would alert people to the existance of the term, and thus help
> those who didn't actually read the documentation.

I'm not sure we want to expose the "exclusive backup" term to users. 
It's a bit confusing. It makes sense in the limited scope in the code in 
xlog.c where it's currently used, but if I wanted to explain what it is 
to users, I don't think I'd choose that term.

> Which actually makes an argument for making that change *anyway*,
> because right now the function is incorrectly named. A function named
> pg_backup_in_progress() should answer the question "is a backup in
> progress". And it doesn't answer that question.

I agree that pg_backup_in_progress() is confusing, if it returns false 
while you're running pg_basebackup. In the doc changes you proposed, you 
call the pg_start/stop_backup() a "low level API" for taking backups. 
That's not suitable for a function name, but I think we should work on 
that, and find a better term that works.

Backup mode? Filesystem backup mode?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


On Fri, Jun 15, 2012 at 11:14 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 15.06.2012 17:54, Magnus Hagander wrote:
>>
>> On Fri, Jun 15, 2012 at 8:16 PM, Robert Haas<robertmhaas@gmail.com>
>>  wrote:
>>>
>>> On Fri, Jun 15, 2012 at 1:52 AM, Magnus Hagander<magnus@hagander.net>
>>>  wrote:
>>>>
>>>> On Fri, Jun 15, 2012 at 1:29 AM, Robert Haas<rhaas@postgresql.org>
>>>>  wrote:
>>>>>
>>>>> New SQL functons pg_backup_in_progress() and pg_backup_start_time()
>>>>>
>>>>> Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
>>>>> Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
>>>>
>>>>
>>>> How well is the term "on-line exclusive backup" really settled with
>>>> people? I wonder if we need to add a specific note to the docs saying
>>>> that the function doesn't consider streaming base backups at all, and
>>>> that one should refer to pg_stat_replication for info about those? Or
>>>> really, should the function be pg_exclusive_backup_in_progress()
>>>> perhaps?
>>>
>>>
>>> Well, if we think that the term "exclusive backup" is not going to be
>>> easily comprehensible, then sticking that into the function name isn't
>>> going to help us much.  I think that's just wordiness for the sake of
>>> being wordy.  I do agree that we could probably improve the clarity of
>>> the documentation along the lines you suggest.
>>
>>
>> It would alert people to the existance of the term, and thus help
>> those who didn't actually read the documentation.
>
>
> I'm not sure we want to expose the "exclusive backup" term to users. It's a
> bit confusing. It makes sense in the limited scope in the code in xlog.c
> where it's currently used, but if I wanted to explain what it is to users, I
> don't think I'd choose that term.
>
>
>> Which actually makes an argument for making that change *anyway*,
>> because right now the function is incorrectly named. A function named
>> pg_backup_in_progress() should answer the question "is a backup in
>> progress". And it doesn't answer that question.
>
>
> I agree that pg_backup_in_progress() is confusing, if it returns false while
> you're running pg_basebackup. In the doc changes you proposed, you call the
> pg_start/stop_backup() a "low level API" for taking backups. That's not
> suitable for a function name, but I think we should work on that, and find a
> better term that works.
>
> Backup mode? Filesystem backup mode?

We already have backup mode, and it covers both of them really. And
filesystem backup mode is also what pg_basebackup does - it takes a
filesystem backup...

The easiest one I can think of is the "manual backup mode", but in the
other thread Simon didn't like that term.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


On Fri, Jun 15, 2012 at 8:49 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> I agree that pg_backup_in_progress() is confusing, if it returns false while
>> you're running pg_basebackup. In the doc changes you proposed, you call the
>> pg_start/stop_backup() a "low level API" for taking backups. That's not
>> suitable for a function name, but I think we should work on that, and find a
>> better term that works.
>>
>> Backup mode? Filesystem backup mode?
>
> We already have backup mode, and it covers both of them really. And
> filesystem backup mode is also what pg_basebackup does - it takes a
> filesystem backup...
>
> The easiest one I can think of is the "manual backup mode", but in the
> other thread Simon didn't like that term.

Let me make things a bit worse since people are trying to figure out
nomenclature and positioning in the documentation, especially taking
consideration of pg_basebackup:

I think that the "exclusive" nature of the pg_(start|stop)_backup mode
(to use the original terminology under reconsideration) is quite
harmful, related to what was raised in
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00024.php (I
revisited this in
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01696.php)

After mulling over this some more, I am less on the fence about how
unfortunate it is that Postgres cannot restart when doing an
"exclusive" base backup: I think it is a severe anti-feature that
should gradually be retired.  pg_basebackup has the better contract
(whereby some information is very carefully inserted into the backup
to trigger archive recovery), and pg_(start|stop)_backup has a worse
one. There are more people performing archiving than there are writing
archiving tools, and the latter category should just be expected to
carefully get this right as pg_basebackup does.  Tragically,
pg_basebackup's archiving technique does not meet my requirements (and
it's a non-trivial optimization that I'm not sure makes sense in every
case, so I'm not sure it should be added), so those of us with other
archivers are left with workarounds like moving the backup file around
during the backup process.

Such a move would render the notion of a "backup in progress" or
single "backup start time" more or less obsolete.  That's not to say
that more reporting in the meantime shouldn't be added, because
changing the archiving contract will take time, and meanwhile people
are going to have to use the old contract between the archiving
software of choice and Postgres for quite a while.  However, I think
the eventual deprecation of "exclusive" backup mode is where things
should go, and maybe this will change someone's perception of how this
should be represented in documentation.

Finally, this complexity goes away (or rather, is moved, but hopefully
made more cohesive) if one can delegate all WAL persistence to other
software.  The presence of backup_label most basically affects whether
one wishes to recover via restore_command or the pg_xlog directory,
which only mattered insomuch that the archiver was an asynchronous
form of replication and pg_xlog was nominally a synchronous one.  It's
becoming less clear to me that these are deserving of being so
distinct in the future: they're all sources of WAL, and with syncrep
and group-commit already available, we're might be in a position to
remove some surface area and duplicated concepts in tooling. Slowly.

-- 
fdr


Daniel Farina wrote:
> After mulling over this some more, I am less on the fence about how
> unfortunate it is that Postgres cannot restart when doing an
> "exclusive" base backup: I think it is a severe anti-feature that
> should gradually be retired.

This anti-feature was introduced with
http://archives.postgresql.org/pgsql-committers/2008-04/msg00275.php
following discussions
http://archives.postgresql.org/pgsql-hackers/2007-11/msg00800.php
and
http://archives.postgresql.org/pgsql-hackers/2008-03/msg01033.php

The problem (at least for me) is that without this the server will
often refuse to restart after a clean shutdown, namely when the
WAL segment containing the required checkpoint has already been
archived.

Yours,
Laurenz Albe