Re: pgsql: Add TAP tests for pg_verify_checksums - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: pgsql: Add TAP tests for pg_verify_checksums
Date
Msg-id 20181019195728.GQ4184@tamriel.snowman.net
Whole thread Raw
In response to Re: pgsql: Add TAP tests for pg_verify_checksums  (Andres Freund <andres@anarazel.de>)
Responses Re: pgsql: Add TAP tests for pg_verify_checksums  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-10-19 14:47:51 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody
> > > ever doubted that. How on earth is that a useful point for anything? The
> > > point isn't that it's a good idea for extensions to do so, but that it
> > > has happened because we didn't provide better alternative.
> >
> > The point is that you're making a case because you happen to know of an
> > extension which does this, but neither of us know about every extension
> > out there and I, at least, don't believe we should be coming up with
> > hacks to try and avoid looking at files that some extension has dumped
> > into the PG tablespace directories because that's never been a
> > documented or supported thing to do.
>
> It's not a hack if it's a quite defendable choice on its own, and the
> presense of such extensions is just one further argument in one
> direction (for explicit whitelisting here).

You've not convinced me that an extension dropping files into our
tablespace directories is either something we should accept or that we
should code around such cases, nor that we are doing anything but
putting a hack in place to deal with what is pretty clearly a hack in
its own right, imv.

> > > > Of course- the same is true with a crash/restart case, so I'm not sure
> > > > what you're getting at here.
> > >
> > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm
> > > not sure it'd make sense to teach it to so (there's not really much it
> > > could do to discern whether a block is a torn page that replay will fix,
> > > or corrupted).
> >
> > ... and this isn't at all relevant, because pg_basebackup does run on a
> > running cluster.
>
> I wasn't ever denying that or anything close to it?  My point is that
> pg_verify_checksum needs much more filtering than it has now to deal
> with that, because it needs to handle all files that could be present,
> not just files that could be present after a graceful shutdown.

Perhaps it doesn't today but surely one goal of pg_verify_checksum is to
be able to run it on a running cluster eventually.

> pg_basebackup's case is *not* really comparable because basebackup.c
> already performed a lot of filtering before noChecksumFiles is applied.

This all really just points out that we should have the code for
handling this somewhere common that both pg_verify_checksum and
pg_basebackup can leverage without duplicating all of it.

The specific case that started all of this certainly looks pretty
clearly like a case that both need to deal with.

> > As pointed out elsewhere on this thread, the logic was the same between
> > the two before this commit...  The code in pg_basebackup came from the
> > prior pg_verify_checksums code.  Certainly, some mention of the code
> > existing in two places, at least, should have been in the comments.
>
> But the filter for basebackup only comes after the pre-existing
> filtering that the basebackup.c code already does. All of:
>
> /*
>  * List of files excluded from backups.
>  */
> static const char *excludeFiles[] =
> {
>     /* Skip auto conf temporary file. */
>     PG_AUTOCONF_FILENAME ".tmp",
>
>     /* Skip current log file temporary file */
>     LOG_METAINFO_DATAFILE_TMP,
>
>     /* Skip relation cache because it is rebuilt on startup */
>     RELCACHE_INIT_FILENAME,
>
>     /*
>      * If there's a backup_label or tablespace_map file, it belongs to a
>      * backup started by the user with pg_start_backup().  It is *not* correct
>      * for this backup.  Our backup_label/tablespace_map is injected into the
>      * tar separately.
>      */
>     BACKUP_LABEL_FILE,
>     TABLESPACE_MAP,
>
>     "postmaster.pid",
>     "postmaster.opts",
>
>     /* end of list */
>     NULL
> };
>
> is not applied, for example.  Nor is:
>
>         /* Skip temporary files */
>         if (strncmp(de->d_name,
>                     PG_TEMP_FILE_PREFIX,
>                     strlen(PG_TEMP_FILE_PREFIX)) == 0)
>             continue;
>
> Nor is
>         /* Exclude temporary relations */
>         if (isDbDir && looks_like_temp_rel_name(de->d_name))
>         {
>             elog(DEBUG2,
>                  "temporary relation file \"%s\" excluded from backup",
>                  de->d_name);
>
>             continue;
>         }
>
> So, yea, they had:
>
> static const char *const skip[] = {
>     "pg_control",
>     "pg_filenode.map",
>     "pg_internal.init",
>     "PG_VERSION",
>     NULL,
> };
>
> in common, but not more.  All the above exclusions are strictly
> necessary.

... but all of that code doesn't change that pg_basebackup also needs to
ignore the EXEC_BACKEND created config_exec_params/.new files.

You're right, pg_verify_checksums, with the assumption that it only runs
on a cleanly shut-down cluster, doesn't need the temp-file-skipping
logic, today, but it's going to need it tomorrow, isn't it?  It seems
like a good idea to avoid duplicating all of that code between the two,
hence the suggestion to put it into libpgcommon, and this all seems to
be getting pretty far away from the main point of disagreement we've
been having around how to handle random files that show up in our
tablespace directories.

> FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a
> lot of (unfortunately) pretty normal scenarios right now. Not just on
> windows. Besides the narrow window of crashing while a .tmp file is
> present (and then shutting down normally after a crash restart), it also
> has the much of wider window of crashing while *any* backend has
> temporary files/relations.  As crash-restarts don't release temp files,
> they'll still be present after the crash, and a single graceful shutdown
> afterwards will leave them in place. No?

We do go through and do some cleanup at crash/restart, but I generally
agree that we should really be looking to have pg_verify_checksums work
on a running cluster and should therefore make it able to identify temp
files and skip over things which we know don't have checksums (and
that's ok) in a similar way to what pg_basebackup does.

> > > Both basebackup an verify checksums already only scan certain
> > > directories. They *already* are encoding directory structure
> > > information.
> >
> > Yes, they do.  I'd like to see that expanded in the future as well, of
> > course.  Sadly, we've already given up any sense of control over what
> > exists in the root of the data directory because of all the random
> > config files and directories like 'pg_log' that end up there, but let's
> > try to avoid making that same mistake about the directories which we
> > create and maintain.
>
> I actually don't think that's a useful goal. ISTM it's going to be very
> likely that we'll *add* more different files that can be in data
> directories / subdirectories.  Various forms of pluggable storage will
> need to have various forms of storing data, and we'll definitely want to
> have their data inside the database directory.  Some will want to be
> checksummed, for others that won't make sense.  We should always know
> which types of files postgres knows about, and treat those in the way we
> want. We shouldn't worry about conflicts against things we don't know in
> a new major version, but we also shouldn't break things if they're
> there.  Realisticaly extensions *have* to experiment before PG has the
> support for whatever they're doing, our release schedule is way too long
> for anything to just wait for PG to support it nice and proper.

I'm all for adding more different files- I didn't intend to suggest
otherwise, but those are things we're adding and can/should account for.

As you say, some of those are likely to have checksums, which should be
handled by pg_basebackup and pg_verify_checksums, and that goes back to
the point I was making up-thread that we want to make sure an account
for everything.  With this pattern-based approach, we could easily end
up forgetting to add the correct new pattern into pg_verify_checksums.
By making sure we account for everything and complain if we don't, we
aren't going to miss anything.  That kind of fail-safe is the kind of
design that I think we are generally trying to go for and is part of why
PG is as robust as it is.

Playing this further and assuming that extensions dropping files into
tablespace directories is acceptable, what are we supposed to do when
there's some direct conflict between a file that we want to create in a
PG tablespace directory and a file that some extension dropped there?
Create yet another subdirectory which we call
"THIS_IS_REALLY_ONLY_FOR_PG"?

What about two different extensions wanting to create files with the
same names in the tablespace directories..?

Experimentation is fine, of course, this is open source code and people
should feel free to play with it, but we are not obligated to avoid
breaking things when an extension author, through their experimentation,
does something which is clearly not supported, like dropping files into
PG's tablespace directories.  Further, when it comes to our user's data,
we should be taking a strict approach and accounting for everything,
something that this whitelist-of-patterns-based approach to finding
files to verify the checksums on doesn't do.

Thanks!

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums
Next
From: Andres Freund
Date:
Subject: Re: pgsql: Add TAP tests for pg_verify_checksums