Re: Review: Patch to compute Max LSN of Data Pages - Mailing list pgsql-hackers

From Hari Babu
Subject Re: Review: Patch to compute Max LSN of Data Pages
Date
Msg-id 008b01ce788c$bee694a0$3cb3bde0$@kommi@huawei.com
Whole thread Raw
In response to Re: Review: Patch to compute Max LSN of Data Pages  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wednesday, July 03, 2013 1:26 AM Robert Haas Wrote:
>On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
>> I think the usecase for this utility isn't big enough to be included in
>> postgres since it really can only help in a very limited
>> circumstances. And I think it's too likely to be misused for stuff it's
>> not useable for (e.g. remastering).
>>
>> The only scenario I see is that somebody deleted/corrupted
>> pg_controldata. In that scenario the tool is supposed to be used to find
>> the biggest lsn used so far so the user then can use pg_resetxlog to set
>> that as the wal starting point.
>> But that can be way much easier solved by just setting the LSN to
>> something very, very high. The database cannot be used for anything
>> reliable afterwards anyway.

>I guess this is true, but I think I'm mildly in favor of including
>this anyway.  I think I would have used it once or twice, if it had
>been there - maybe not even to feed into pg_resetxlog, but just to
>check for future LSNs.  We don't have anything like a suite of
>diagnostic tools in bin or contrib today, for use by database
>professionals in fixing things that fall strictly in the realm of
>"don't try this at home", and I think we should have such a thing.
>Unfortunately this covers about 0.1% of the space I'd like to see
>covered, which might be a reason to reject it or at least insist that
>it be enhanced first.

>At any rate, I don't think this is anywhere near committable as it
>stands today.  Some random review comments:

Thanks for the detailed review.

>remove_parent_refernces() is spelled wrong.

>Why does this patch need all of this fancy path-handling logic -
>specifically, remove_parent_refernces() and make_absolute_path()?
>Surely its needs are not that different from pg_ctl or pg_resetxlog or
>pg_controldata.  If they all need it and it's duplicated, we should
>fix that.  Otherwise, why the special logic here?

>I don't think we need getLinkPath() either.  The server has no trouble
>finding its files by just using a pathname that follows the symlink.
>Why do we need anything different here?  The whole point of symlinks
>is that you can traverse them transparently, without knowing where
>they point.

Removed the special handling of path functions.

>Duplicating PageHeaderIsValid doesn't seem acceptable.  Moreover,
>since the point of this is to be able to use it on a damaged cluster,
>why is that logic even desirable?  I think we'd be better off assuming
>the pages to be valid.

Corrected.

>The calling convention for this utility seems quite baroque.  There's
>no real need for all of this -p/-P stuff.  I think the syntax should
>just be:

>pg_computemaxlsn file_or_directory...

>For each argument, we determine whether it's a file or a directory.
>If it's a file, we assume it's a PostgreSQL data file and scan it.  If
>it's a directory, we check whether it looks like a data directory.  If
>it does, we recurse through the whole tree structure and find the data
>files, and process them.  If it doesn't look like a data directory, we
>scan each plain file in that directory whose name looks like a
>PostgreSQL data file name.  With this approach, there's no need to
>limit ourselves to a single input argument and no need to specify what
>kind of argument it is; the tool just figures it out.

Changed to accept file or directory without of options.

>I think it would be a good idea to have a mode that prints out the max
>LSN found in *each data file* scanned, and then prints out the overall
>max found at the end.  In fact, I think that should perhaps be the
>default mode, with -q, --quiet to disable it.  When printing out the
>per-file data, I think it would be useful to track and print the block
>number where the highest LSN in that file was found.  I have
>definitely had cases where I suspected, but was not certain of,
>corruption.  One could use a tool like this to hunt for problems, and
>then use something like pg_filedump to dump the offending blocks.
>That would be a lot easier than running pg_filedump on the whole file
>and grepping through the output.

Corrected.

>Similarly, I see no reason for the restriction imposed by
>check_path_belongs_to_pgdata().  I've had people mail me one data
>file; why shouldn't I be able to run this tool on it?  It's a
>read-only utility.

>- if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not
>PostgreSQL style.

>+       printf(_("%s compute the maximum LSN in PostgreSQL data
>pages.\n\n"), progname);

Fixed.

>+       /*
>+        * Don't allow pg_computemaxlsn to be run as root, to avoid
overwriting
>+        * the ownership of files in the data directory. We need only check
for
>+        * root -- any other user won't have sufficient permissions to
modify
>+        * files in the data directory.
>+        */
>+ #ifndef WIN32
>+       if (geteuid() == 0)
>+       {
>+               fprintf(stderr, _("%s: cannot be executed by \"root\".\n"),
>+                               progname);
>+               fprintf(stderr, _("You must run %s as the PostgreSQL
>superuser.\n"),
>+                               progname);
>+               exit(1);
>+       }
>+ #endif

>This utility only reads files; it does not modify them.  So this seems
>unnecessary.  I assume it's blindly copied from somewhere else.

>+               fprintf(stderr, _("%s: \"%s\" not a valid  data
directory.\n"),

>Extra space.

>+       /*
>+        * Check for a postmaster lock file --- if there is one, refuse to
>+        * proceed, on grounds we might be interfering with a live
installation.
>+        */
>+       snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir);

>Again, this might be appropriate for pg_resetxlog, but I see no reason
>for the restriction here.  The output might be inaccurate in that
>case, but if you're using this tool you're required to know what
>you're doing.

Fixed.


>+    For safety reasons, you must specify the data directory on the
>command line.
>+    <command>pg_computemaxlsn</command> does not use the environment
variable
>+    <envar>PGDATA</>.

>Same thing here.  I think using PGDATA would be quite appropriate for
>this utility.

Fixed.

>+   <para>
>+    This command must not be used when the server is
>+    running.  <command>pg_computemaxlsn</command> will refuse to start up
if
>+    it finds a server lock file in the data directory.  If the
>+    server crashed then a lock file might have been left
>+    behind; in that case you can remove the lock file to allow
>+    <command>pg_computemaxlsn</command> to run.  But before you do
>+    so, make doubly certain that there is no server process still alive.
>+   </para>

>More of the same paranoia.

>Overall my feeling is that this can be simplified quite a lot.
>There's a lot of stuff in here that's really not needed.

Corrected.

Please find the updated patch attached.

Regards,
Hari babu.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: request a new feature in fuzzystrmatch
Next
From: Mark Kirkwood
Date:
Subject: Re: [9.4 CF 1] The Commitfest Slacker List