Re: Review: Patch to compute Max LSN of Data Pages - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Review: Patch to compute Max LSN of Data Pages |
Date | |
Msg-id | CA+TgmoZqbYHWYbi18FUk-+dGHig=icV+pj-NNav3miy4dajgrQ@mail.gmail.com Whole thread Raw |
In response to | Re: Review: Patch to compute Max LSN of Data Pages (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Review: Patch to compute Max LSN of Data Pages
Re: Review: Patch to compute Max LSN of Data Pages |
List | pgsql-hackers |
On Wed, Oct 9, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>>>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>>>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>>>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >>>>> > > + This utility can also be used to decide whether backup is >>>>> > required or not when the data page >>>>> > > + in old-master precedes the last applied LSN in old-standby >>>>> > (i.e., new-master) at the >>>>> > > + moment of the failover. >>>>> > > + </para> >>>>> > > + </refsect1> >>>>> > >>>>> > I don't think this is safe in any interesting set of cases. Am I >>>>> > missing >>>>> > something? >>>>> >>>>> No, you are not missing anything. It can be only used to find max LSN in >>>>> database which can avoid further corruption >> >>>>Why is the patch submitted documenting it as a use-case then? I find it >>>>rather scary if the *patch authors* document a known unsafe use case as >>>>one of the known use-cases. >> >>>I got the problem which can occur with the specified use case. Removed the >>>wrong use case specified above. >>>Thanks for the review, please find the updated patch attached in the mail. >> >> Patch is not getting compiled on Windows, I had made following changes: >> a. updated the patch for resolving Windows build >> b. few documentation changes in (pg_resetxlog.sgml) for spelling >> mistake and minor line change >> c. corrected year for Copyright in file pg_computemaxlsn.c > > I am OK with this patch in its current form, modulo some grammar > issues in the documentation which I can fix before committing. > However, I'm unclear whether there was sufficient consensus to proceed > with this. Can others weigh in? If there is too much residual > unhappiness with this, then we should just mark this as Rejected and > stop wasting time on it; it can be pushed to PGXN or similar even if > we don't put it in core. I didn't hear any other votes. Anyone else have an opinion about this? If I can't get a +1 from anyone who wasn't involved in writing the patch, I'm inclined to think we don't have sufficient consensus to commit this. On further review of the patch, I also found a number of other issues that I think need to be fixed before we could consider committing it: - Per a previous request of mine, the patch has three different modes: it can be run on an individual file, on a directory potentially containing multiple relation files, or on an entire data directory. This is not explained in the documentation. - The patch skips printing an error if attempting to open a file returns ENOENT. I don't see why that case shouldn't print an error. Yeah, it could be legit if you're executing this against a running server, but why are you doing that? And even if you are (e.g. for corruption detection), printing a error message and proceeding makes more sense than proceeding without printing anything, at least IMHO. - Some of the static functions in this file preceed main and others follow it. And they use different naming conventions. I suggest putting all of them after main() and using names like check_data_dir, check_data_file_name (instead of validateRelfilenodename; note that the comment for that function is also incorrect), find_max_lsn_in_file, find_max_lsn_in_directory, find_max_lsn_in_pgdata. - Since the patch goes to some pains to exit with 0 if no errors were encountered and 1 if any were, that probably ought to be documented. But I think instead of passing a "result" argument back up the call stack it would be better to just keep a global variable called "errors_encountered". When we encounter an error, bump that value. Then you can just do exit(errors_encountered > 0 ? 1 : 0) and not bother passing all of this stuff around via the return value. The current behavior is inconsistent in another way, too: if you encounter an error while scanning through a particular directory, you finish the whole directory. But if multiple command-line arguments were passed, you don't proceed to any subsequent command-line arguments. I think you should continue always, which the above-mentioned change will take care of basically for free. - The description of the -q option is not very clear. A reader could be forgiven for thinking that the option suppresses all output, which would be quite useless, or at least left in doubt about what output will still be provided. A broader complaint I have with this patch is that it almost but not-quite solves a problem I've had a few times in the past: namely, searching through the data directory for data blocks which have LSNs in the future. This has come up a few times for me, and this tool would make it easier, because I'd be able to run it and look through the output to see which relations have high max-LSN values. However, it wouldn't be quite enough, because it'd only tell me about the block with the highest LSN in each file, whereas what I'd really want to find is every block with an LSN greater than some threshold value. Maybe I'm pushing the envelope too much by trying to fit that into the framework of this patch, but I can't help thinking we're not going to want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are 95% the same code, so maybe we ought to rename the utility to something slightly more generic than "pg_computemaxlsn". In some ways this is almost a plausible shell for a generic block-corruption checker. If we wanted to validate, for example, that every page in a relation (or a whole data directory) had the expected page version number, we'd want almost exactly this same code structure, just with more checks added in. I'm going to mark this "Returned with Feedback" in the CF app for now.I still think that there's promise in this idea but,on further reflection, committing what we have here now doesn't feel like the right decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: