Re: pgsql: Add TAP tests for pg_verify_checksums - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgsql: Add TAP tests for pg_verify_checksums
Date
Msg-id 20181019192146.j4my7yjzv4eitw2w@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: Add TAP tests for pg_verify_checksums  (Stephen Frost <sfrost@snowman.net>)
Responses Re: pgsql: Add TAP tests for pg_verify_checksums  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Hi,

On 2018-10-19 14:47:51 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2018-10-19 14:11:03 -0400, Stephen Frost wrote:
> > > * Andres Freund (andres@anarazel.de) wrote:
> > > > cstore e.g. does this, and it's already out there. So yes, we should
> > > > provide better infrastructure. But for now, we gotta deal with reality,
> > > > unless we just want to gratuituously break things.
> > > 
> > > No, I don't agree that we are beholden to an external extension that
> > > decided to start writing into directories that clearly belong to PG.
> > 
> > Did it have an alternative?
> 
> Yes, work with us to come up with a way to accomplish what they wanted
> without creating unknown/untracked files in our tablespaces.
> 
> Or, have their own mechanism for storing files on other filesystems.
> 
> And likely other options.  Saying that we should account for their
> putting arbitrary files into the PG tablespace directories because they
> "didn't have any other choice" seems pretty ridiculous, imv.
> 
> > > How do we even know that what cstore does in the tablespace directory
> > > wouldn't get caught up in the checksum file pattern-matching that this
> > > commit put in?
> > 
> > You listen to people?
> > 
> > > What if there was an extension which did create files that would
> > > match, what would we do then?  I'm happy to go create one, if that'd
> > > help make the point that this "pattern whitelist" isn't actually a
> > > solution but is really rather just a hack that'll break in the future.
> > 
> > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> > ever doubted that. How on earth is that a useful point for anything? The
> > point isn't that it's a good idea for extensions to do so, but that it
> > has happened because we didn't provide better alternative.
> 
> The point is that you're making a case because you happen to know of an
> extension which does this, but neither of us know about every extension
> out there and I, at least, don't believe we should be coming up with
> hacks to try and avoid looking at files that some extension has dumped
> into the PG tablespace directories because that's never been a
> documented or supported thing to do.

It's not a hack if it's a quite defendable choice on its own, and the
presense of such extensions is just one further argument in one
direction (for explicit whitelisting here).


> > > > > > And I fail to see why a blacklist is architecturally better. There's
> > > > > > plenty cases where we might want to create temporary files,
> > > > > > non-checksummed data or such that we'd need a teach a blacklist about,
> > > > > > but there's far fewer cases where add new checksummed files. Basically
> > > > > > never since checksums have been introduced. And if checksums were
> > > > > > introduced for something new, say slrus, we'd ceertainly use
> > > > > > pg_verify_checksum during development.
> > > > > 
> > > > > In cases where we need something temporary, we're going to need to be
> > > > > prepared to clean those up and we had better make it very plain that
> > > > > they're temporary and easy to write code for.  Anything we aren't
> > > > > prepared to blow away on a crash-restart should be checksum'd and in an
> > > > > ideal world, we'd be looking to reduce the blacklist to only those
> > > > > things which are temporary.
> > > > 
> > > > There's pending patches that add support for pg_verify_checksums running
> > > > against a running cluster. We'll not just need to recognize files that
> > > > are there after a graceful shutdown, but with anything that a cluster
> > > > can have there while running.
> > > 
> > > Of course- the same is true with a crash/restart case, so I'm not sure
> > > what you're getting at here.
> > 
> > pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> > not sure it'd make sense to teach it to so (there's not really much it
> > could do to discern whether a block is a torn page that replay will fix,
> > or corrupted).
> 
> ... and this isn't at all relevant, because pg_basebackup does run on a
> running cluster.

I wasn't ever denying that or anything close to it?  My point is that
pg_verify_checksum needs much more filtering than it has now to deal
with that, because it needs to handle all files that could be present,
not just files that could be present after a graceful shutdown.

pg_basebackup's case is *not* really comparable because basebackup.c
already performed a lot of filtering before noChecksumFiles is applied.


> > > I wasn't ever thinking of only the graceful shutdown case- certainly
> > > pg_basebackup operates when the cluster is running also and that's
> > > more what I was thinking about from the start and which is part of
> > > what started the discussion about this commit as that was entirely
> > > ignored in this change.
> > 
> > It was mainly ignored in the original pg_verify_checksums change, not in
> > the commit discussed here. That was broken. That built separate logic.
> 
> As pointed out elsewhere on this thread, the logic was the same between
> the two before this commit...  The code in pg_basebackup came from the
> prior pg_verify_checksums code.  Certainly, some mention of the code
> existing in two places, at least, should have been in the comments.

But the filter for basebackup only comes after the pre-existing
filtering that the basebackup.c code already does. All of:

/*
 * List of files excluded from backups.
 */
static const char *excludeFiles[] =
{
    /* Skip auto conf temporary file. */
    PG_AUTOCONF_FILENAME ".tmp",

    /* Skip current log file temporary file */
    LOG_METAINFO_DATAFILE_TMP,

    /* Skip relation cache because it is rebuilt on startup */
    RELCACHE_INIT_FILENAME,

    /*
     * If there's a backup_label or tablespace_map file, it belongs to a
     * backup started by the user with pg_start_backup().  It is *not* correct
     * for this backup.  Our backup_label/tablespace_map is injected into the
     * tar separately.
     */
    BACKUP_LABEL_FILE,
    TABLESPACE_MAP,

    "postmaster.pid",
    "postmaster.opts",

    /* end of list */
    NULL
};

is not applied, for example.  Nor is:

        /* Skip temporary files */
        if (strncmp(de->d_name,
                    PG_TEMP_FILE_PREFIX,
                    strlen(PG_TEMP_FILE_PREFIX)) == 0)
            continue;

Nor is
        /* Exclude temporary relations */
        if (isDbDir && looks_like_temp_rel_name(de->d_name))
        {
            elog(DEBUG2,
                 "temporary relation file \"%s\" excluded from backup",
                 de->d_name);

            continue;
        }

So, yea, they had:

static const char *const skip[] = {
    "pg_control",
    "pg_filenode.map",
    "pg_internal.init",
    "PG_VERSION",
    NULL,
};

in common, but not more.  All the above exclusions are strictly
necessary.

FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a
lot of (unfortunately) pretty normal scenarios right now. Not just on
windows. Besides the narrow window of crashing while a .tmp file is
present (and then shutting down normally after a crash restart), it also
has the much of wider window of crashing while *any* backend has
temporary files/relations.  As crash-restarts don't release temp files,
they'll still be present after the crash, and a single graceful shutdown
afterwards will leave them in place. No?


> > Both basebackup an verify checksums already only scan certain
> > directories. They *already* are encoding directory structure
> > information.
> 
> Yes, they do.  I'd like to see that expanded in the future as well, of
> course.  Sadly, we've already given up any sense of control over what
> exists in the root of the data directory because of all the random
> config files and directories like 'pg_log' that end up there, but let's
> try to avoid making that same mistake about the directories which we
> create and maintain.

I actually don't think that's a useful goal. ISTM it's going to be very
likely that we'll *add* more different files that can be in data
directories / subdirectories.  Various forms of pluggable storage will
need to have various forms of storing data, and we'll definitely want to
have their data inside the database directory.  Some will want to be
checksummed, for others that won't make sense.  We should always know
which types of files postgres knows about, and treat those in the way we
want. We shouldn't worry about conflicts against things we don't know in
a new major version, but we also shouldn't break things if they're
there.  Realisticaly extensions *have* to experiment before PG has the
support for whatever they're doing, our release schedule is way too long
for anything to just wait for PG to support it nice and proper.


> Also, I feel like maybe you hit send before you intended to..?

Nope.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums
Next
From: Stephen Frost
Date:
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums