Re: Something else about Redo Logs disappearing - Mailing list pgsql-general

From Stephen Frost
Subject Re: Something else about Redo Logs disappearing
Date
Msg-id 20200610002257.GQ6680@tamriel.snowman.net
Whole thread Raw
In response to Re: Something else about Redo Logs disappearing  (Peter <pmc@citylink.dinoex.sub.org>)
List pgsql-general
Greetings,

* Peter (pmc@citylink.dinoex.sub.org) wrote:
> On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
> ! * Peter (pmc@citylink.dinoex.sub.org) wrote:
> ! > This professional backup solution also offers support for postgres.
> ! > Sadly, it only covers postgres up to Rel.9, and that piece of software
> ! > wasn't touched in the last 6 or 7 years.
> !
> ! Then it certainly doesn't work with the changes in v12, and probably has
> ! other issues, as you allude to.
>
> Just having a look at their webpage, something seems to have been updated
> recently, they now state that they have a new postgres adapter:
>
> https://www.bareos.com/en/company_news/postgres-plugin-en1.html
> Enjoy reading, and tell us what You think.

I'm afraid I'm not particularly interested in performing a pro bono
evaluation of a commercial product, though considering they've put out a
press release with obviously broken links, I already have suspicions of
what I'd find ... (try clicking on their 'experimental/nightly' link).

A quick look at the docs also shows that it's referring to
recovery.conf, which no longer exists since v12 was released back in
September, so, yeah, isn't exactly current.

> ! > Actually, I am getting very tired of reading that something which can
> ! > easily be done within 20 lines of shell scripting, would need special
> !
> ! This is just simply false- you can't do it properly in 20 lines of shell
> ! scripting.
>
> Well, Your own docs show how to do it with a one-liner. So please
> don't blame me for improving that to 20 lines.

No, the documentation provides an example for the purpose of
understanding how the replacement in the command is done and explicitly
says that you probably shouldn't use that command.

> ! Sure, you can write something that has probably next to no
> ! error checking,
>
> Before judging that, one should first specify what precisely is the
> demand.

I really don't need to in order to be able to judge the notion of a 20
line shell script being able to manage to perform a backup correctly.

> In my understanding, backup is done via pgdump. The archive logs are
> for emergencies (data corruption, desaster), only. And emergencies
> would usually be handled by some professional people who know what
> they have to do.

No, that's not the case.  pg_dump isn't at all involved in the backups
that we're talking about here which are physical, file-level, backups.

> ! uses the deprecated API that'll cause your systems to
> ! fail to start if you ever happen to have a reboot during a backup
>
> It is highly unlikely that I did never have that happen during 15
> years. So what does that mean? If I throw in a pg_start_backup('bogus'),
> and then restart the cluster, it will not work anymore?

If you perform a pg_start_backup(), have a checkpoint happen such that
older WAL is removed, and then reboot the box or kill -9 postgres, no,
it's not going to start anymore because there's going to be a
backup_label file that is telling the cluster that it needs to start
replaying WAL from an older point in time than what you've got WAL for.

> Lets see...
> Clean stop/start - no issue whatsoever. (LOG:  online backup mode
> canceled)
> kill -9 the whole flock - no issue whatsoever (Log: database system
> was interrupted)
> I won't pull the plug now, but that has certainly happened lots of
> times in the past, and also yielded no issue whatsoever - simply
> because there *never* was *any* issue whatsover with Postgres (until
> I got the idea to install the relatively fresh R.12 - but that's
> understandable).

Being lucky really isn't what you want to bet on.

> So maybe this problem exists only on Windows?

No, it's not Windows specific.

> And yes, I read that whole horrible discussion, and I could tear my
> hair out, really, concerning the "deprecated API". I suppose You mean
> the mentioning in the docs that the "exclusive low-level backup" is
> somehow deprecated.

Yes, it's deprecated specifically because of the issues outlined above.
They aren't hypothetical, they do happen, and people do get bit by them.

> This is a very big bad. Because: normally you can run the base backup
> as a strictly ordinary file-level backup in "full" mode, just as any
> backup software can do it. You will simply execute the
> pg_start_backup() and pg_stop_backup() commands in the before- and
> after- hooks - and any backup software will offer these hooks.
>
> But now, with the now recommended "non-exclusive low-level backup",
> the task is different: now your before-hook needs to do two things
> at the same time:
>  1. keep a socket open in order to hold the connection to postgres
>     (because postgres will terminate the backup when the socket is
>     closed), and
>  2. invoke exit(0) (because the actual backup will not start until
>     the before- hook has properly delivered a successful exit code.
> And, that is not only difficult, it is impossible.

One would imagine that if the commercial vendors wished to actually
support PG properly, they'd manage to figure out a way to do so that
doesn't involve the kind of hook scripts and poor assumptions made about
them that you're discussing here.

Considering that every single backup solution written specifically for
PG, including the shell-based ones, have managed to figure out how to
work with the new API, it hardly seems impossible for them to do so.

> So yes, this is really a LOT of work. But the point is: this all is
> not really necessary, because currently the stuff works fine in the
> old way.

Unfortunately, no, it doesn't work fine in the general case- you might
be lucky enough to get it to work sometimes without failure, but that's
not how one designs systems, to work in the 'lucky' case and fail badly
in other cases.

> So, well, do away with the old method - but you cannot do it away
> inside of rel.12 - and then I will stay with 12 for as long as
> possible (and I don't think I will be the only one).

You're welcome to stay with it as long as you'd like.  I do hope we
finally rip it out, as was discussed before, in v13.  Of course, we'll
stop supporting v12 about 5 years after we release it.

> I see no point in creating artificial complications, which then create
> a necessity for individual tools to handle them, which then create a
> new requirement for testing and validating all these individual tools -
> as this is strictly against the original idea as Brian Kernighan
> explained it: use simple and versatile tools, and combine these to
> achieve the individual task.

These aren't artificial complications.

> ! PG generally isn't something that can be backed up using the simple file
> ! based backup solutions, as you might appreciate from just considering
> ! the number of tools written to specifically deal with the complexity of
> ! backing up an online PG cluster.
>
> Yes, one could assume that. But then, I would prefer well-founded
> technical reasons for what exactly would not work that way, and why it
> would not work that way. And there seems to be not much about that.

I've explained them above, and they were explained on the thread you
evidently glanced at regarding deprecating the old API.

> And in such a case I tend to trust my own understanding, similar to the
> full_page_writes matter. (In 2008 I heard about ZFS, and I concluded
> that if ZFS is indeed copy-on-write, and if the description of the
> full_page_writes option is correct, then one could safely switch it
> off and free a lot of backup space - factor 10 at that time, with some
> Rel.8. And so I started to use ZFS. Nobody would confirm that at that
> time, but nowadays everybody does it.)

I don't agree that 'everybody does it', nor that it's a particularly
good idea to turn off full_page_writes and depend on ZFS to magic it.
In fact, I'd suggest you go watch this PGCon talk, once it's available
later this month (from a competitor of mine, but a terribly smart
individual, so you don't need to listen to me about it)-

https://www.pgcon.org/events/pgcon_2020/schedule/session/101-avoiding-detecting-and-recovering-from-data-corruption/

> This was actually my job as a consultant: to de-mystify technology,
> and make it understandable as an arrangement of well explainable
> pieces of functionality with well-deducible consequences.
> But this is no longer respected today; now people are expected to
> *NOT* understand the technology they handle, and instead believe
> in marketing and that it all is very complicated and un-intellegible.

Perhaps I'm wrong, but I tend to feel like I've got a pretty decent
handle on both PostgreSQL and on how file-level backups of it work.

Thanks,

Stephen

Attachment

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Help with plpython3u
Next
From: PEDRO PABLO SEVERIN HONORATO
Date:
Subject: Re: Help with plpython3u