Re: Review: Patch to compute Max LSN of Data Pages - Mailing list pgsql-hackers
From | Hari Babu |
---|---|
Subject | Re: Review: Patch to compute Max LSN of Data Pages |
Date | |
Msg-id | 008b01ce788c$bee694a0$3cb3bde0$@kommi@huawei.com Whole thread Raw |
In response to | Re: Review: Patch to compute Max LSN of Data Pages (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Wednesday, July 03, 2013 1:26 AM Robert Haas Wrote: >On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I think the usecase for this utility isn't big enough to be included in >> postgres since it really can only help in a very limited >> circumstances. And I think it's too likely to be misused for stuff it's >> not useable for (e.g. remastering). >> >> The only scenario I see is that somebody deleted/corrupted >> pg_controldata. In that scenario the tool is supposed to be used to find >> the biggest lsn used so far so the user then can use pg_resetxlog to set >> that as the wal starting point. >> But that can be way much easier solved by just setting the LSN to >> something very, very high. The database cannot be used for anything >> reliable afterwards anyway. >I guess this is true, but I think I'm mildly in favor of including >this anyway. I think I would have used it once or twice, if it had >been there - maybe not even to feed into pg_resetxlog, but just to >check for future LSNs. We don't have anything like a suite of >diagnostic tools in bin or contrib today, for use by database >professionals in fixing things that fall strictly in the realm of >"don't try this at home", and I think we should have such a thing. >Unfortunately this covers about 0.1% of the space I'd like to see >covered, which might be a reason to reject it or at least insist that >it be enhanced first. >At any rate, I don't think this is anywhere near committable as it >stands today. Some random review comments: Thanks for the detailed review. >remove_parent_refernces() is spelled wrong. >Why does this patch need all of this fancy path-handling logic - >specifically, remove_parent_refernces() and make_absolute_path()? >Surely its needs are not that different from pg_ctl or pg_resetxlog or >pg_controldata. If they all need it and it's duplicated, we should >fix that. Otherwise, why the special logic here? >I don't think we need getLinkPath() either. The server has no trouble >finding its files by just using a pathname that follows the symlink. >Why do we need anything different here? The whole point of symlinks >is that you can traverse them transparently, without knowing where >they point. Removed the special handling of path functions. >Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover, >since the point of this is to be able to use it on a damaged cluster, >why is that logic even desirable? I think we'd be better off assuming >the pages to be valid. Corrected. >The calling convention for this utility seems quite baroque. There's >no real need for all of this -p/-P stuff. I think the syntax should >just be: >pg_computemaxlsn file_or_directory... >For each argument, we determine whether it's a file or a directory. >If it's a file, we assume it's a PostgreSQL data file and scan it. If >it's a directory, we check whether it looks like a data directory. If >it does, we recurse through the whole tree structure and find the data >files, and process them. If it doesn't look like a data directory, we >scan each plain file in that directory whose name looks like a >PostgreSQL data file name. With this approach, there's no need to >limit ourselves to a single input argument and no need to specify what >kind of argument it is; the tool just figures it out. Changed to accept file or directory without of options. >I think it would be a good idea to have a mode that prints out the max >LSN found in *each data file* scanned, and then prints out the overall >max found at the end. In fact, I think that should perhaps be the >default mode, with -q, --quiet to disable it. When printing out the >per-file data, I think it would be useful to track and print the block >number where the highest LSN in that file was found. I have >definitely had cases where I suspected, but was not certain of, >corruption. One could use a tool like this to hunt for problems, and >then use something like pg_filedump to dump the offending blocks. >That would be a lot easier than running pg_filedump on the whole file >and grepping through the output. Corrected. >Similarly, I see no reason for the restriction imposed by >check_path_belongs_to_pgdata(). I've had people mail me one data >file; why shouldn't I be able to run this tool on it? It's a >read-only utility. >- if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not >PostgreSQL style. >+ printf(_("%s compute the maximum LSN in PostgreSQL data >pages.\n\n"), progname); Fixed. >+ /* >+ * Don't allow pg_computemaxlsn to be run as root, to avoid overwriting >+ * the ownership of files in the data directory. We need only check for >+ * root -- any other user won't have sufficient permissions to modify >+ * files in the data directory. >+ */ >+ #ifndef WIN32 >+ if (geteuid() == 0) >+ { >+ fprintf(stderr, _("%s: cannot be executed by \"root\".\n"), >+ progname); >+ fprintf(stderr, _("You must run %s as the PostgreSQL >superuser.\n"), >+ progname); >+ exit(1); >+ } >+ #endif >This utility only reads files; it does not modify them. So this seems >unnecessary. I assume it's blindly copied from somewhere else. >+ fprintf(stderr, _("%s: \"%s\" not a valid data directory.\n"), >Extra space. >+ /* >+ * Check for a postmaster lock file --- if there is one, refuse to >+ * proceed, on grounds we might be interfering with a live installation. >+ */ >+ snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir); >Again, this might be appropriate for pg_resetxlog, but I see no reason >for the restriction here. The output might be inaccurate in that >case, but if you're using this tool you're required to know what >you're doing. Fixed. >+ For safety reasons, you must specify the data directory on the >command line. >+ <command>pg_computemaxlsn</command> does not use the environment variable >+ <envar>PGDATA</>. >Same thing here. I think using PGDATA would be quite appropriate for >this utility. Fixed. >+ <para> >+ This command must not be used when the server is >+ running. <command>pg_computemaxlsn</command> will refuse to start up if >+ it finds a server lock file in the data directory. If the >+ server crashed then a lock file might have been left >+ behind; in that case you can remove the lock file to allow >+ <command>pg_computemaxlsn</command> to run. But before you do >+ so, make doubly certain that there is no server process still alive. >+ </para> >More of the same paranoia. >Overall my feeling is that this can be simplified quite a lot. >There's a lot of stuff in here that's really not needed. Corrected. Please find the updated patch attached. Regards, Hari babu.
pgsql-hackers by date: