Re: Something else about Redo Logs disappearing - Mailing list pgsql-general

From Peter
Subject Re: Something else about Redo Logs disappearing
Date
Msg-id 20200615125022.GA21249@gate.oper.dinoex.org
Whole thread Raw
In response to Re: Something else about Redo Logs disappearing  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Something else about Redo Logs disappearing  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-general
On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote:
! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote:
! > ! >  4. If, by misconfiguration and/or operator error, the backup system
! > ! >     happens to start a second backup. in parallel to the first,
! > ! >     then do I correctly assume, both backups will be rendered
! > ! >     inconsistent while this may not be visible to the operator; and
! > ! >     the earlier backup would be flagged as apparently successful while
! > ! >     carrying the wrong (later) label?
! > ! 
! > ! If you are using my scripts and start a second backup while the first
! > ! one is still running, the first backup will be interrupted.
! > 
! > This is not what I am asking. It appears correct to me, that, on
! > the database, the first backup will be interrupted. But on the
! > tape side, this might go unnoticed, and on completion it will
! > successfully receive the termination code from the *SECOND*
! > backup - which means that on tape we will have a seemingly
! > successful backup, which
! >  1. is corrupted, and
! >  2. carries a wrong label.
! 
! That will only happen if the backup that uses my scripts does the
! wrong thing.

Yes. Occasionally software does the wrong thing, it's called "bugs".

! An example:
! 
! - Backup #1 calls "pgpre.sh"
! - Backup #1 starts copying files
! - Backup #2 calls "pgpre.sh".
!   This will cancel the first backup.
! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".
!   It will receive an error.
!   So it has to invalidate the backup.
! - Backup #2 completes copying files.
! - Backup #2 calls "pgpost.sh".
!   It gets a "backup_label" file and completes the backup.

That's not true.


Now let me see how to compile a bash... and here we go:

! An example:
! 
! - Backup #1 calls "pgpre.sh"

> $ ./pgpre.sh
> backup starting location: 1/C8000058
> $

We now have:
> 24129 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24130 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24131 10  SJ   0:00.01 psql -Atq
> 24158 10  SCJ  0:00.00 sleep 5

And:
> postgres=# \d
>          List of relations
>  Schema |  Name  | Type  |  Owner   
> --------+--------+-------+----------
>  public | backup | table | postgres
> (1 row)
>  
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> ----+---------+-------+--------------+----------------
>   1 | running | 24132 |              | 
> (1 row)

! - Backup #1 starts copying files

Let's suppose it does now.

! - Backup #2 calls "pgpre.sh".

> $ ./pgpre.sh
> backup starting location: 1/C9000024
> $ FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> connection to server was lost
> Backup failed
> ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect
> 
> $ echo $?
> 0

!   This will cancel the first backup.

Yes, it seems it did:

> 25279 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25280 10  IWJ  0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25281 10  SJ   0:00.01 psql -Atq
> 25402 10  SCJ  0:00.00 sleep 5

> postgres=# \d
>          List of relations
>  Schema |  Name  | Type  |  Owner   
> --------+--------+-------+----------
>  public | backup | table | postgres
> (1 row)
> 
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> ----+---------+-------+--------------+----------------
>   1 | running | 25282 |              | 
> (1 row)

! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".

> $ ./pgpost.sh 
> START WAL LOCATION: 1/C9000024 (file 0000000100000001000000C9)
> CHECKPOINT LOCATION: 1/C9000058
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2020-06-15 14:09:41 CEST
> LABEL: 2020-06-15 14:09:40
> START TIMELINE: 1
>
> $ echo $?
> 0

!   It will receive an error.
!   So it has to invalidate the backup.

Where is the error?

What we now have is this:
No processes anymore.

>  id |  state   |  pid  |                          backup_label                          | tablespace_map 
> ----+----------+-------+----------------------------------------------------------------+----------------
>   1 | complete | 25282 | START WAL LOCATION: 1/C9000024 (file 0000000100000001000000C9)+| 
>     |          |       | CHECKPOINT LOCATION: 1/C9000058                               +| 
>     |          |       | BACKUP METHOD: streamed                                       +| 
>     |          |       | BACKUP FROM: master                                           +| 
>     |          |       | START TIME: 2020-06-15 14:09:41 CEST                          +| 
>     |          |       | LABEL: 2020-06-15 14:09:40                                    +| 
>     |          |       | START TIMELINE: 1                                             +| 
>     |          |       |                                                                | 
> (1 row)

! - Backup #2 completes copying files.
! - Backup #2 calls "pgpost.sh".
!   It gets a "backup_label" file and completes the backup.


Wishful thinking.

BOTH backups are now inconsistent, and the first got the label from
the second, and appears to be intact. Exactly as I said before.

I don't need to try such things out. I can do logical verification in
my mind, by looking at the code.

And on the same foundation I am saying that this whole new API is a
misconception.


cheerio,
PMc



pgsql-general by date:

Previous
From: Niels Jespersen
Date:
Subject: SV: pg_service.conf and client support
Next
From: Laurenz Albe
Date:
Subject: Re: Something else about Redo Logs disappearing