Re: Review: Patch to compute Max LSN of Data Pages - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Review: Patch to compute Max LSN of Data Pages |
Date | |
Msg-id | CA+Tgmoag8myqtu9x6M_AcMSE8sNECuKhWeDuLdNOnOf8-0Kbgg@mail.gmail.com Whole thread Raw |
In response to | Re: Review: Patch to compute Max LSN of Data Pages (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Review: Patch to compute Max LSN of Data Pages
Re: Review: Patch to compute Max LSN of Data Pages |
List | pgsql-hackers |
On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I think the usecase for this utility isn't big enough to be included in > postgres since it really can only help in a very limited > circumstances. And I think it's too likely to be misused for stuff it's > not useable for (e.g. remastering). > > The only scenario I see is that somebody deleted/corrupted > pg_controldata. In that scenario the tool is supposed to be used to find > the biggest lsn used so far so the user then can use pg_resetxlog to set > that as the wal starting point. > But that can be way much easier solved by just setting the LSN to > something very, very high. The database cannot be used for anything > reliable afterwards anyway. I guess this is true, but I think I'm mildly in favor of including this anyway. I think I would have used it once or twice, if it had been there - maybe not even to feed into pg_resetxlog, but just to check for future LSNs. We don't have anything like a suite of diagnostic tools in bin or contrib today, for use by database professionals in fixing things that fall strictly in the realm of "don't try this at home", and I think we should have such a thing. Unfortunately this covers about 0.1% of the space I'd like to see covered, which might be a reason to reject it or at least insist that it be enhanced first. At any rate, I don't think this is anywhere near committable as it stands today. Some random review comments: remove_parent_refernces() is spelled wrong. Why does this patch need all of this fancy path-handling logic - specifically, remove_parent_refernces() and make_absolute_path()? Surely its needs are not that different from pg_ctl or pg_resetxlog or pg_controldata. If they all need it and it's duplicated, we should fix that. Otherwise, why the special logic here? I don't think we need getLinkPath() either. The server has no trouble finding its files by just using a pathname that follows the symlink. Why do we need anything different here? The whole point of symlinks is that you can traverse them transparently, without knowing where they point. Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover, since the point of this is to be able to use it on a damaged cluster, why is that logic even desirable? I think we'd be better off assuming the pages to be valid. The calling convention for this utility seems quite baroque. There's no real need for all of this -p/-P stuff. I think the syntax should just be: pg_computemaxlsn file_or_directory... For each argument, we determine whether it's a file or a directory. If it's a file, we assume it's a PostgreSQL data file and scan it. If it's a directory, we check whether it looks like a data directory. If it does, we recurse through the whole tree structure and find the data files, and process them. If it doesn't look like a data directory, we scan each plain file in that directory whose name looks like a PostgreSQL data file name. With this approach, there's no need to limit ourselves to a single input argument and no need to specify what kind of argument it is; the tool just figures it out. I think it would be a good idea to have a mode that prints out the max LSN found in *each data file* scanned, and then prints out the overall max found at the end. In fact, I think that should perhaps be the default mode, with -q, --quiet to disable it. When printing out the per-file data, I think it would be useful to track and print the block number where the highest LSN in that file was found. I have definitely had cases where I suspected, but was not certain of, corruption. One could use a tool like this to hunt for problems, and then use something like pg_filedump to dump the offending blocks. That would be a lot easier than running pg_filedump on the whole file and grepping through the output. Similarly, I see no reason for the restriction imposed by check_path_belongs_to_pgdata(). I've had people mail me one data file; why shouldn't I be able to run this tool on it? It's a read-only utility. - if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not PostgreSQL style. + printf(_("%s compute the maximum LSN in PostgreSQL data pages.\n\n"), progname); s/compute/computes/ + /* + * Don't allow pg_computemaxlsn to be run as root, to avoid overwriting + * the ownership of files in the data directory. We need only check for + * root -- any other user won't have sufficient permissions to modify + * files in the data directory. + */ + #ifndef WIN32 + if (geteuid() == 0) + { + fprintf(stderr, _("%s: cannot be executed by \"root\".\n"), + progname); + fprintf(stderr, _("You must run %s as the PostgreSQL superuser.\n"), + progname); + exit(1); + } + #endif This utility only reads files; it does not modify them. So this seems unnecessary. I assume it's blindly copied from somewhere else. + fprintf(stderr, _("%s: \"%s\" not a valid data directory.\n"), Extra space. + /* + * Check for a postmaster lock file --- if there is one, refuse to + * proceed, on grounds we might be interfering with a live installation. + */ + snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir); Again, this might be appropriate for pg_resetxlog, but I see no reason for the restriction here. The output might be inaccurate in that case, but if you're using this tool you're required to know what you're doing. + For safety reasons, you must specify the data directory on the command line. + <command>pg_computemaxlsn</command> does not use the environment variable + <envar>PGDATA</>. Same thing here. I think using PGDATA would be quite appropriate for this utility. + <para> + This command must not be used when the server is + running. <command>pg_computemaxlsn</command> will refuse to start up if + it finds a server lock file in the data directory. If the + server crashed then a lock file might have been left + behind; in that case you can remove the lock file to allow + <command>pg_computemaxlsn</command> to run. But before you do + so, make doubly certain that there is no server process still alive. + </para> More of the same paranoia. Overall my feeling is that this can be simplified quite a lot. There's a lot of stuff in here that's really not needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: