Re: pgsql: Add TAP tests for pg_verify_checksums - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: pgsql: Add TAP tests for pg_verify_checksums
Date
Msg-id 20181019184751.GP4184@tamriel.snowman.net
Whole thread Raw
In response to Re: pgsql: Add TAP tests for pg_verify_checksums  (Andres Freund <andres@anarazel.de>)
Responses Re: pgsql: Add TAP tests for pg_verify_checksums  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-10-19 14:11:03 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > cstore e.g. does this, and it's already out there. So yes, we should
> > > provide better infrastructure. But for now, we gotta deal with reality,
> > > unless we just want to gratuituously break things.
> >
> > No, I don't agree that we are beholden to an external extension that
> > decided to start writing into directories that clearly belong to PG.
>
> Did it have an alternative?

Yes, work with us to come up with a way to accomplish what they wanted
without creating unknown/untracked files in our tablespaces.

Or, have their own mechanism for storing files on other filesystems.

And likely other options.  Saying that we should account for their
putting arbitrary files into the PG tablespace directories because they
"didn't have any other choice" seems pretty ridiculous, imv.

> > How do we even know that what cstore does in the tablespace directory
> > wouldn't get caught up in the checksum file pattern-matching that this
> > commit put in?
>
> You listen to people?
>
> > What if there was an extension which did create files that would
> > match, what would we do then?  I'm happy to go create one, if that'd
> > help make the point that this "pattern whitelist" isn't actually a
> > solution but is really rather just a hack that'll break in the future.
>
> Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> ever doubted that. How on earth is that a useful point for anything? The
> point isn't that it's a good idea for extensions to do so, but that it
> has happened because we didn't provide better alternative.

The point is that you're making a case because you happen to know of an
extension which does this, but neither of us know about every extension
out there and I, at least, don't believe we should be coming up with
hacks to try and avoid looking at files that some extension has dumped
into the PG tablespace directories because that's never been a
documented or supported thing to do.

> > > > > And I fail to see why a blacklist is architecturally better. There's
> > > > > plenty cases where we might want to create temporary files,
> > > > > non-checksummed data or such that we'd need a teach a blacklist about,
> > > > > but there's far fewer cases where add new checksummed files. Basically
> > > > > never since checksums have been introduced. And if checksums were
> > > > > introduced for something new, say slrus, we'd ceertainly use
> > > > > pg_verify_checksum during development.
> > > >
> > > > In cases where we need something temporary, we're going to need to be
> > > > prepared to clean those up and we had better make it very plain that
> > > > they're temporary and easy to write code for.  Anything we aren't
> > > > prepared to blow away on a crash-restart should be checksum'd and in an
> > > > ideal world, we'd be looking to reduce the blacklist to only those
> > > > things which are temporary.
> > >
> > > There's pending patches that add support for pg_verify_checksums running
> > > against a running cluster. We'll not just need to recognize files that
> > > are there after a graceful shutdown, but with anything that a cluster
> > > can have there while running.
> >
> > Of course- the same is true with a crash/restart case, so I'm not sure
> > what you're getting at here.
>
> pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> not sure it'd make sense to teach it to so (there's not really much it
> could do to discern whether a block is a torn page that replay will fix,
> or corrupted).

... and this isn't at all relevant, because pg_basebackup does run on a
running cluster.

> > I wasn't ever thinking of only the graceful shutdown case- certainly
> > pg_basebackup operates when the cluster is running also and that's
> > more what I was thinking about from the start and which is part of
> > what started the discussion about this commit as that was entirely
> > ignored in this change.
>
> It was mainly ignored in the original pg_verify_checksums change, not in
> the commit discussed here. That was broken. That built separate logic.

As pointed out elsewhere on this thread, the logic was the same between
the two before this commit...  The code in pg_basebackup came from the
prior pg_verify_checksums code.  Certainly, some mention of the code
existing in two places, at least, should have been in the comments.

Also, just to be clear, as I tried to say up-thread, I'm not trying to
lay blame here or suggest that Michael shouldn't have been trying to fix
the issue that came up, but I don't agree that this is the way to fix
it, and, in any case, we need to make sure that pg_basebackup behaves
correctly as well.  As mentioned elsewhere, that'd be best done by
adding something to libpgcommon and then changing both pg_basebackup and
pg_verify_checksums to use that.

> Both basebackup an verify checksums already only scan certain
> directories. They *already* are encoding directory structure
> information.

Yes, they do.  I'd like to see that expanded in the future as well, of
course.  Sadly, we've already given up any sense of control over what
exists in the root of the data directory because of all the random
config files and directories like 'pg_log' that end up there, but let's
try to avoid making that same mistake about the directories which we
create and maintain.

Also, I feel like maybe you hit send before you intended to..?

Thanks!

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Apurv Shah
Date:
Subject: Google Code In Mentorship
Next
From: Andres Freund
Date:
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums