Re: Something else about Redo Logs disappearing - Mailing list pgsql-general

From Magnus Hagander
Subject Re: Something else about Redo Logs disappearing
Date
Msg-id CABUevEwVg7MuUySqeGkQ7uJKB2LWBkEgJj+qDeLWndMZpoY7Bg@mail.gmail.com
Whole thread Raw
In response to Re: Something else about Redo Logs disappearing  (Peter <pmc@citylink.dinoex.sub.org>)
Responses Re: Something else about Redo Logs disappearing  (Peter <pmc@citylink.dinoex.sub.org>)
List pgsql-general


On Sat, Jun 13, 2020 at 10:13 PM Peter <pmc@citylink.dinoex.sub.org> wrote:
On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote:
! > Okay. So lets behave like professional people and figure how that
! > can be achieved:
! > At first, we drop that WAL requirement, because with WAL archiving
! > it is already guaranteed that an unbroken chain of WAL is always
! > present in the backup (except when we have a bug like the one that
! > lead to this discussion).
! > So this is **not part of the scope**.
! >
!
! I would assume that anybody who deals with backups professionally wouldn't
! consider that out of scope,

I strongly disagree. I might suppose You haven't thought this to the
proper end. See:

You may disagree, but I would argue that this is because you are the one who has not thought it through. But hey, let's agree to disagree.


You can see that all the major attributes (scheduling, error-handling,
signalling, ...) of a WAL backup are substantially different to that
of any usual backup. 
This is a different *Class* of backup object, therefore it needs an
appropriate infrastructure that can handle these attributes correctly.

Yes, this is *exactly* why special-handling the WAL during the base backup makes a lot of sense.

Is it required? No.
Will it make your backups more reliable? Yes.

But it depends on what your priorities are.


But, if You never have considered *continuous* archiving, and only
intend to take a functional momentarily backup of a cluster, then You
may well have never noticed these differences. I noticed them mainly
because I did *BUILD* such an infrastructure (the 20 lines of shell
script, you know).

Yes, if you take a simplistic view of your backups, then yes.


And yes, I was indeed talking about *professional* approaches.

Sure.



! There is *absolutely* no need for threading to use the current APIs. You
! need to run one query, go do something else, and then run another
! query.

Wrong. The point is, I dont want to "go do something else", I have to
exit() and get back to the initiator at that place.

That is not a requirement of the current PostgreSQL APIs. (in fact, using threading would add a significant extra burden there, as libpq does not allow sharing of connections between threads)

That is a requirement, and indeed a pretty sharp limitation, of the *other* APIs you are working with, it sounds like.

The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do they require any form of threading.

And the fact that you need to do an exit() would negate any threading anyway, so that seems to be a false argument regardless.


This is also clearly visible in Laurenz' code: he utilizes two
unchecked background tasks (processes, in this case) with loose
coupling for the purpose, as it does not work otherwise.

Yes, because he is also trying to work around a severely limited API *on the other side*.

There's plenty of backup integrations that don't have this limitation. They all work perfectly fine with no need for exit() and certainly no weird need for special threading.


The most interesting point in there appears to be this:
  > that the backup label and tablespace map files are not written to
  > disk. Instead, their would-be contents are returned in *labelfile
  > and *tblspcmapfile,

This is in do_pg_start_backup() - so we actually HAVE this data
already at the *START* time of the backup! 

Then why in hell do we wait until the END of the backup before we
hand this data to the operator: at a time when the DVD with the

Because it cannot be safely written *into the data directory*.

Now, it could be written *somewhere else*, that is true. And then you would add an extra step at restore time to rename it back. But then your restore would now also require a plugin.

(

 
backup is already fixated and cannot be changed anymore, so that
 
You don't need to change the the backup, only append to it. If you are calling pg_stop_backup() at a time when that is no longer possible, then you are calling pg_stop_backup() at the wrong time.


As I can read, there is no difference in the function requirements
between exclusive and non-exclusive mode, in that regard: the
backup-label file is NOT necessary in the running cluster data tree,
BUT it should get into the RESTORED data tree before starting it.

Correct. It is in fact actively harmful in the running cluster data tree.


And I can't find a single one of those "big problems". What I do find
is just people whining that their cluster doesn't start and they can't
simply delete a file, even if told so. Like soldier complaining that
his gun doesn't shoot and he has no idea how to reload.

Have you actually tried it? Or dealt with the many people who have run into corruption around this?

Again, as suggested before, review the discussions that led up to the changes. There are plenty of examples there.


! > I now hope very much that Magnus Hagander will tell some of the
! > impeding "failure scenarios", because I am getting increasingly
! > tired of pondering about probable ones, and searching the old
! > list entries for them, without finding something substantial.

! Feel free to look at the mailinglist archives. Many of them have been
! explained there before. Pay particular attention to the threads around when
! the deprecated APIs were actually deprecaed.

I *DID* read all that stuff. About hundred messages. It is HORRIBLE.
I was tearing out my hair in despair.  

To subsume: it all circles around catering for gross pilot error and
stupidity.

Yes, and people not reading the documentation. Or not liking what they read and therefore ignoring it. 


//Magnus

pgsql-general by date:

Previous
From: Ron
Date:
Subject: Re: BUG #11141: Duplicate primary key values corruption
Next
From: Bruce Momjian
Date:
Subject: Re: Oracle vs. PostgreSQL - a comment