Thread: Review: Patch to compute Max LSN of Data Pages
- All regression tests pass.
<p dir="LTR"><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><font face="Consolas">Friday, November 23, 2012 5:38AM</font></span><span lang="en-us"><font face="Consolas"></font></span><span lang="en-us"> <font face="Consolas">MuhammadUsama</font></span><span lang="en-us"><font face="Consolas"> wrote:</font></span><span lang="en-us"></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">Hi Amit</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"><font face="Consolas">I have reviewed and tested the patch, Following are my observations and comments.</font></span><pdir="LTR"><span lang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas"> Thank you for the review.</font></span><span lang="en-us"><font face="Consolas"></font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><font face="Consolas"> </font></span><span lang="en-us"> <font face="Consolas">Ineed some clarification regarding some of the comments</font></span><span lang="en-us"></span><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">Observationsand Comments</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"> <font face="Consolas">-------------------------------------------</font></span><br/><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"><font face="Consolas">- I think when finding the max LSN of single filethe utility should consider all relation segments.</font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><fontface="Consolas"> Would you like to find for all</font></span><span lang="en-us"><font face="Consolas">relation related segments:</font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><font face="Consolas"> </font></span><spanlang="en-us"> <font face="Consolas">Like 12345, 12345.1 ... 12345.n Or</font></span><spanlang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas"> </font></span><span lang="en-us"><font face="Consolas">12345, 12345.1 ... and 12345_vm, 12345.1_vm</font></span><span lang="en-us"></span><pdir="LTR"><span lang="en-us"><font face="Consolas"> </font></span><p dir="LTR"><span lang="en-us"><fontface="Consolas"> But how about if user asks for 12345.4, do we need to find all greater than</font></span><spanlang="en-us"> <font face="Consolas">12345</font></span><span lang="en-us"><font face="Consolas">.4</font></span><spanlang="en-us"><font face="Consolas">?</font></span><p dir="LTR"><span lang="en-us"></span><pdir="LTR"><span lang="en-us"><font face="Consolas"> IMHO, as this utility is not aware of relationor any other logical entity and deals with only file or directory,</font></span><p dir="LTR"><span lang="en-us"><fontface="Consolas"> </font></span><span lang="en-us"> <font face="Consolas">i</font></span><span lang="en-us"><fontface="Consolas">t is okay to find only for that file.</font></span><span lang="en-us"></span><br /><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">- For -p {file| dir} option the utility expects the file path relative to the specified data directory path which makes thinglittle</font></span><span lang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"> <font face="Consolas">confusing</font></span><span lang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <fontface="Consolas">for example </font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"> <font face="Consolas">./pg_computemaxlsn -p data/base/12286/12200 data</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">pg_computemaxlsn:file or directory "data/base/12286/12200" does not exists</font></span><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">although data/base/12286/12200is valid file path but we gets file does not exists error</font></span><spanlang="en-us"></span><p dir="LTR"><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><fontface="Consolas">Is changing path</font></span><span lang="en-us"> <font face="Consolas">to</font></span><spanlang="en-us"> <font face="Consolas">"</font></span><span lang="en-us"><font face="Consolas">data/</font></span><spanlang="en-us"><font face="Consolas">data/base/12286/12200"</font></span><span lang="en-us"><font face="Consolas">in error me</font></span><span lang="en-us"><font face="Consolas">s</font></span><spanlang="en-us"><font face="Consolas">sage</font></span><span lang="en-us"> <font face="Consolas">willmake things more clear?</font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"></span><pdir="LTR"><span lang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas"> </font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"><fontface="Consolas">Code Review</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"><font face="Consolas">---------------------</font></span><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">- While findingthe max LSN from single file, you are not verifying that the file actually exists inside the data directory path</font></span><spanlang="en-us"> </span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"> <font face="Consolas">provided. and can end up looking</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">atwrong data directory for checking if the server is running.</font></span><p dir="LTR"><span lang="en-us"><fontface="Consolas">></font></span><span lang="en-us"> <font face="Consolas">For example:</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">bin/pg_computemaxlsn-p $PWD/data/base/12286/12200 $PWD/otherinstallation/data</font></span><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">Maximum LSNfound is: 0, WAL segment file name (fileid,seg): 0000000000000000</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"><font face="Consolas">- Code is not verifying, if the provided data directoryis the valid data directory, This could make pg_computemaxlsn to compute max LSN</font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">fromthe running server file.</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"> <font face="Consolas">For example: </font></span><p dir="LTR"><spanlang="en-us"><font face="Consolas">></font></span><span lang="en-us"> <font face="Consolas">bin/pg_computemaxlsn-p $PWD/data/base/12286/12200 NONDATADIR/</font></span><p dir="LTR"><span lang="en-us"><fontface="Consolas">></font></span><span lang="en-us"> <font face="Consolas">Maximum LSN found is: 0, WALsegment file name (fileid,seg): 0000000000000000</font></span><span lang="en-us"></span><p dir="LTR"><span lang="en-us"><fontface="Consolas">I think both of the above can be addressed if we allow only relative path for file or directoryand check if that is rel</font></span><span lang="en-us"><font face="Consolas">ative and below data directory withfunction path_is_</font></span><span lang="en-us"><font face="Consolas">relative_and_below_cwd</font></span><span lang="en-us"><fontface="Consolas">()</font></span><span lang="en-us"><font face="Consolas">.</font></span><span lang="en-us"></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"><font face="Consolas">Questions</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">></font></span><span lang="en-us"><font face="Consolas">-----------------</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">></font></span><spanlang="en-us"><font face="Consolas">- I am wondering why are you not following thelink inside the "pg_tblspc" directory to get to the table space location instead of</font></span><span lang="en-us"> <fontface="Consolas">></font></span><span lang="en-us"><font face="Consolas">building the directory path. I am thinkingif you follow the link instead, The utility can be used across the different versions of PG.</font></span><span lang="en-us"></span><pdir="LTR"><span lang="en-us"><font face="Consolas">This is good suggestion. I shall take care of</font></span><spanlang="en-us"> <font face="Consolas">this in updated patch.</font></span><p dir="LTR"><span lang="en-us"><fontface="Consolas">With Regards,</font></span><p dir="LTR"><span lang="en-us"><font face="Consolas">Amit Kapila.</font></span><spanlang="en-us"></span><br /><br /><p dir="LTR"><span lang="en-us"><font face="Consolas">Regards,</font></span><pdir="LTR"><span lang="en-us"><font face="Consolas">Muhammad Usama</font></span><spanlang="en-us"></span>
> -------------------------------------------
> - If no option is given to pg_computemaxlsn utility then we get a wrong
> error message
> ./pg_computemaxlsn ../data
> pg_computemaxlsn: file or directory "(null)" exists
> In my opinion we should consider the -P (Print max LSN from whole data
> directory) as default in case no option is specified, or otherwise throw a
> more descriptive error.
Fixed. By considering default option as Whole data directory.
> - Options -p and -P should be mutually exclusive
- Long options missing for -p and -P
Fixed. introduced long options as --path & --data-directory.
> - Help string for -P should be something like "print max LSN from whole
> data directory" instead of "print max LSN from whole database"
> - Help menu is silent about -V or --version option
> - I think when finding the max LSN of single file the utility should
> consider all relation segments.
it is okay to find only for that file.
> - There is no check for extra arguments
> e.g
> ./pg_computemaxlsn -P . ../data/ ../data/base/ ../data2/
- Extra options gives error.
- Multiple -p options also gives error.
Eg:
./pg_computemaxlsn -p base/12286/12200 -p base/12286/12201 data
> - For -p {file | dir} option the utility expects the file path relative to
> the specified data directory path which makes thing little confusing
> for example
> ./pg_computemaxlsn -p data/base/12286/12200 data
> pg_computemaxlsn: file or directory "data/base/12286/12200" does not exists
> although data/base/12286/12200 is valid file path but we gets file does not
> exists error
> - Utility should print the max LSN number in hexadecimal notation and LSN
> format (logid/segno) for consistency with PG,
> - When finding max LSN from directory, the utility does not considers the
> sub directories, which I think could be useful in some cases, like if we
> want to find the max LSN in tablespace directory. With the current
> implementation we would require to run the utility multiple times, once for
> each database.
> Code Review
> ---------------------
> Code generally looks good, I have few small points.
> - In main() function variable maxLSN is defined as uint64,
> Although XLogRecPtr and uint64 are the same thing, but I think we should
> use XLogRecPtr instead.
> - Just a small thing although will hardly improve anything but still for
> sake of saving few cycles, I think you should use XLByteLT ( < ) instead
> of XLByteLE ( <= )
> to find if the previous max LSN is lesser then current.
> - '\n' is missing from one the printf in the code :-)
> fprintf(stderr, _("%s: file or directory \"%s\" does not exists"),
> progname, LsnSearchPath);
> - While finding the max LSN from single file, you are not verifying that
> the file actually exists inside the data directory path provided. and can
> end up looking
>at wrong data directory for checking if the server is running.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200
> $PWD/otherinstallation/data
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
> 0000000000000000
> - Code is not verifying, if the provided data directory is the valid data
> directory, This could make pg_computemaxlsn to compute max LSN from the
> running server file.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200 NONDATADIR/
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
>0000000000000000
> Questions
> -----------------
> - I am wondering why are you not following the link inside the "pg_tblspc"
> directory to get to the table space location instead of building the
> directory path. I am thinking if you follow the link instead, The utility
> can be used across the different versions of PG.
Attachment
Amit Kapila escribió: > > Friday, November 23, 2012 5:38 AM Muhammad Usama wrote: > >- I think when finding the max LSN of single file the utility should > > consider all relation segments. > Would you like to find for all relation related segments: > Like 12345, 12345.1 ... 12345.n Or > 12345, 12345.1 ... and 12345_vm, 12345.1_vm > > But how about if user asks for 12345.4, do we need to find all greater > than 12345.4? > > IMHO, as this utility is not aware of relation or any other logical > entity and deals with only file or directory, it is okay to find > only for that file. Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it should consider that it is a relfilenode, and so it should get a list of all segments for all forks of it. So if I give "12345" it should get 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on. However, if what I give it is a path, i.e. it contains a slash, I think it should only consider the specific file mentioned. In that light, I'm not sure that command line options chosen are the best set. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, November 28, 2012 3:36 AM Alvaro Herrera wrote: > Amit Kapila escribió: > > > > Friday, November 23, 2012 5:38 AM Muhammad Usama wrote: > > > >- I think when finding the max LSN of single file the utility should > > > consider all relation segments. > > Would you like to find for all relation related segments: > > Like 12345, 12345.1 ... 12345.n Or > > 12345, 12345.1 ... and 12345_vm, 12345.1_vm > > > > But how about if user asks for 12345.4, do we need to find all > greater > > than 12345.4? > > > > IMHO, as this utility is not aware of relation or any other logical > > entity and deals with only file or directory, it is okay to find > > only for that file. > > Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it > should consider that it is a relfilenode, and so it should get a list of > all segments for all forks of it. So if I give "12345" it should get > 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on. Yes, I think this can be done either by having it as new option or internally code can interpret as you suggest. Also, can't we think of it as an extended usecase and implement it on top of base, if this usecase is not an urgent? > However, if what I give it is a path, i.e. it contains a slash, I think > it should only consider the specific file mentioned. In that light, I'm > not sure that command line options chosen are the best set. Yes for sure command line options can be improved/extended based on usecase. Please feel free to give suggestion regarding command line option if you feel current option is not an extendable option. One way is -p should be for file path and may be -r for relfile number. With Regards, Amit Kapila.
Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:> Observations and Comments
> -------------------------------------------
> - If no option is given to pg_computemaxlsn utility then we get a wrong
> error message
> ./pg_computemaxlsn ../data
> pg_computemaxlsn: file or directory "(null)" exists
> In my opinion we should consider the -P (Print max LSN from whole data
> directory) as default in case no option is specified, or otherwise throw a
> more descriptive error.
Fixed. By considering default option as Whole data directory.
> - Options -p and -P should be mutually exclusiveFixed. Both options will not be allowed at the same time.
- Long options missing for -p and -P
Fixed. introduced long options as --path & --data-directory.
> - Help string for -P should be something like "print max LSN from whole
> data directory" instead of "print max LSN from whole database"Fixed, by changing the message as suggested.
> - Help menu is silent about -V or --version optionFixed.
> - I think when finding the max LSN of single file the utility should
> consider all relation segments.IMHO, as this utility is not aware of relation or any other logical entity and deals with only file or directory,it is okay to find only for that file.
> - There is no check for extra arguments
> e.g
> ./pg_computemaxlsn -P . ../data/ ../data/base/ ../data2/Fixed.
- Extra options gives error.
- Multiple -p options also gives error.
Eg:
./pg_computemaxlsn -p base/12286/12200 -p base/12286/12201 data
> - For -p {file | dir} option the utility expects the file path relative to
> the specified data directory path which makes thing little confusing
> for example
> ./pg_computemaxlsn -p data/base/12286/12200 data
> pg_computemaxlsn: file or directory "data/base/12286/12200" does not exists
> although data/base/12286/12200 is valid file path but we gets file does not
> exists errorAs the path of file is relative to data directory, that's why in error it prints the path as "data/base/12286/12200".Do you have any suggestion what should be done for this?
> - Utility should print the max LSN number in hexadecimal notation and LSN
> format (logid/segno) for consistency with PG,Fixed> and may also print the file name which contains that LSN.It might be confusing to users for the cases where it has to follow link (pg_tblspc) and then print some absolute path which is not related to relative path given by user.Also I am not sure if it will be of any use to user.
> - When finding max LSN from directory, the utility does not considers the
> sub directories, which I think could be useful in some cases, like if we
> want to find the max LSN in tablespace directory. With the current
> implementation we would require to run the utility multiple times, once for
> each database.It will consider sub-directories as well.
> Code Review
> ---------------------
> Code generally looks good, I have few small points.
> - In main() function variable maxLSN is defined as uint64,
> Although XLogRecPtr and uint64 are the same thing, but I think we should
> use XLogRecPtr instead.Changed as per suggestion.
> - Just a small thing although will hardly improve anything but still for
> sake of saving few cycles, I think you should use XLByteLT ( < ) instead
> of XLByteLE ( <= )
> to find if the previous max LSN is lesser then current.Changed as per suggestion.
> - '\n' is missing from one the printf in the code :-)
> fprintf(stderr, _("%s: file or directory \"%s\" does not exists"),
> progname, LsnSearchPath);Changed as per suggestion.
> - While finding the max LSN from single file, you are not verifying that
> the file actually exists inside the data directory path provided. and can
> end up looking
>at wrong data directory for checking if the server is running.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200
> $PWD/otherinstallation/data
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
> 0000000000000000
> - Code is not verifying, if the provided data directory is the valid data
> directory, This could make pg_computemaxlsn to compute max LSN from the
> running server file.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200 NONDATADIR/
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
>0000000000000000Fixed, by allowing only relative path for file or directory and check if that is relative and below data directory with function path_is_relative_and_below_cwd().
> Questions
> -----------------
> - I am wondering why are you not following the link inside the "pg_tblspc"
> directory to get to the table space location instead of building the
> directory path. I am thinking if you follow the link instead, The utility
> can be used across the different versions of PG.Changed as per suggestion.Apart from above, updated the documentation.With Regards,Amit Kapila.
Amit Kapila escribió:>
> Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:> >- I think when finding the max LSN of single file the utility should
> > consider all relation segments.> Would you like to find for all relation related segments:
> Like 12345, 12345.1 ... 12345.n Or
> 12345, 12345.1 ... and 12345_vm, 12345.1_vm
>
> But how about if user asks for 12345.4, do we need to find all greater
> than 12345.4?
>> IMHO, as this utility is not aware of relation or any other logicalHmm. I think I'd expect that if I give pg_computemaxlsn a number, it
> entity and deals with only file or directory, it is okay to find
> only for that file.
should consider that it is a relfilenode, and so it should get a list of
all segments for all forks of it. So if I give "12345" it should get
12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.
However, if what I give it is a path, i.e. it contains a slash, I think
it should only consider the specific file mentioned. In that light, I'm
not sure that command line options chosen are the best set.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, November 28, 2012 11:49 AM Muhammad Usama wrote: On Tue, Nov 27, 2012 at 5:52 PM, Amit kapila <amit.kapila@huawei.com> wrote: Friday, November 23, 2012 5:38 AM Muhammad Usama wrote: >>> - For -p {file | dir} option the utility expects the file path relative to >>> the specified data directory path which makes thing little confusing >>> for example >>> ./pg_computemaxlsn -p data/base/12286/12200 data >>> pg_computemaxlsn: file or directory "data/base/12286/12200" does not exists >>> although data/base/12286/12200 is valid file path but we gets file does not >>> exists error >> As the path of file is relative to data directory, that's why in error it prints the path as "data/base/12286/12200". >> Do you have any suggestion what should be done for this? >I think we should expect provided path to be relative to current directory or may consider it to be relative to either one of Data or CWD. >Because normally we expect path to be relative to CWD if some program is asking for path in command line. And also tab-complete doesn't >works on terminal if path is not absolute or relative to the current directory. >To handle this I think we should not change current directory to the data directory and generate the file path (like for postmaster.pid) >where required by appending data directory path. >And before throwing an error should check if path is valid for either DATA or CWD something like Apart from that, I think it needs to also check if the path is under data directory path in either case, else it can get LSN for some running server as pointed in your other point. Also shouldn't we support relative to one of CWD or data directory rather than both? Any opinions from others what is the best to support in such a use case. a. File/Folder path Relative to CWD (Current working directory) b. File/Folder path Relative to data directory c. to both a and b With Regards, Amit Kapila.
>Because normally we expect path to be relative to CWD if some program is
> asking for path in command line.
> should consider that it is a relfilenode, and so it should get a list of
> all segments for all forks of it. So if I give "12345" it should get
> 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.
> However, if what I give it is a path, i.e. it contains a slash, I think
> it should only consider the specific file mentioned. In that light, I'm
> not sure that command line options chosen are the best set.
Attachment
>I think we should expect provided path to be relative to current directory> or may consider it to be relative to either one of Data or CWD.
>Because normally we expect path to be relative to CWD if some program is
> asking for path in command line.Please find the attached patch to make the path relative to CWD and check if the path is under data directory.
Now the only point raised by Alvaro and you for this patch which is not yet addressed is :> Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it
> should consider that it is a relfilenode, and so it should get a list of
> all segments for all forks of it. So if I give "12345" it should get
> 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.
> However, if what I give it is a path, i.e. it contains a slash, I think
> it should only consider the specific file mentioned. In that light, I'm
> not sure that command line options chosen are the best set.I am just not sure whether we should handle this functionality and if we have to handle what is better way to provide it to user.Shall we provide new option -r or something for it.Opinions/Suggestions?IMHO, such functionality can be easily extendable in future.However I have no problem in implementing such functionality if you are of opinion that this is basic and it should go with first version of feature.
With Regards,Amit Kapila.
Hi Muhammad, On Friday, December 07, 2012 7:43 PM Muhammad Usama wrote: >Hi Amit On Mon, Dec 3, 2012 at 6:56 PM, Amit kapila <amit.kapila@huawei.com<mailto:amit.kapila@huawei.com>> wrote: >>I think we should expect provided path to be relative to current directory >> or may consider it to be relative to either one of Data or CWD. >>Because normally we expect path to be relative to CWD if some program is >> asking for path in command line. > Please find the attached patch to make the path relative to CWD and check if the path is under data directory. > Works good now. Thank you for verification. > Although I am thinking why are you disallowing the absolute path of file. Any particular reason? The reason to disallow absolute path is that, we need to test on multiple platforms and to keep the scope little less. I thought we can allow absolute paths in future. > I also had a similar point made by Alvaro to allow all the segments of the relation for a given relation file name, oradd another option do do the same. But if everybody is fine with leaving it for the future, I do > not have any furtherconcerns with the patch. It is good from my side. In my opinion we can extend the utility in future for both the below points suggested: 1. allow absolute paths in file path 2. allow to get max lsn for relation segments. If you are also okay, then we can proceed and let Committer also share his opinion. Thank you for reviewing the patch. With Regards, Amit Kapila.
Amit kapila <amit.kapila@huawei.com> writes: > On Friday, December 07, 2012 7:43 PM Muhammad Usama wrote: >> Although I am thinking why are you disallowing the absolute path of file. Any particular reason? > The reason to disallow absolute path is that, we need to test on multiple platforms and to keep the scope little less. This argument seems to me to be completely nuts. What's wrong with an absolute path? Wouldn't you have to go out of your way to disallow it? regards, tom lane
On Saturday, December 08, 2012 9:44 AM Tom Lane wrote: Amit kapila <amit.kapila@huawei.com> writes: > On Friday, December 07, 2012 7:43 PM Muhammad Usama wrote: >>> Although I am thinking why are you disallowing the absolute path of file. Any particular reason? >> The reason to disallow absolute path is that, we need to test on multiple platforms and to keep the scope little less. >This argument seems to me to be completely nuts. What's wrong with an >absolute path? Wouldn't you have to go out of your way to disallow it? There's nothing wrong with absolute path. I have updated the patch to work for absolute path as well. Updated patch attached with this mail. With Regards, Amit Kapila.
Attachment
Please find the rebased Patch for Compute MAX LSN.
There was one compilation error as "undefined reference to XLByteLT " as earlier XLogRecPtr was a structure as
typedef struct XLogRecPtr
{
uint32 xlogid; /* log file #, 0 based */
uint32 xrecoff; /* byte offset of location in log file */
} XLogRecPtr;
So in order to compare two LSN, there was one macro as XLByteLT to compare both fields.
But now XLogRecPtr has been changed as just uint64 and so XLByteLT is removed.
So the change done is to replace XLByteLT(*maxlsn, pagelsn) with (*maxlsn < pagelsn).
Muhammad, Can you verify if every thing is okay, then this can be marked as "Ready for Committer"
With Regards,
Amit Kapila.
Attachment
2013/1/18 Amit kapila <amit.kapila@huawei.com>: > Please find the rebased Patch for Compute MAX LSN. The function 'remove_parent_refernces' couldn't be called 'remove_parent_references' ? Why not an extension in PGXN instead of a contrib? Regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://github.com/guedes - http://guedesoft.net http://www.postgresql.org.br
On Sunday, January 20, 2013 4:04 AM Dickson S. Guedes wrote: 2013/1/18 Amit kapila <amit.kapila@huawei.com>: >> Please find the rebased Patch for Compute MAX LSN. >The function 'remove_parent_refernces' couldn't be called >'remove_parent_references' ? Shall fix this. > Why not an extension in PGXN instead of a contrib? This functionality has similarity to pg_resetxlog, so we thought of putting it either in bin or in contrib. Finally based on suggestions from other community members, we have added to contrib. With Regards, Amit Kapila.
2013/1/20 Amit kapila <amit.kapila@huawei.com>: > On Sunday, January 20, 2013 4:04 AM Dickson S. Guedes wrote: > 2013/1/18 Amit kapila <amit.kapila@huawei.com>: >>> Please find the rebased Patch for Compute MAX LSN. > >>The function 'remove_parent_refernces' couldn't be called >>'remove_parent_references' ? > > Shall fix this. > >> Why not an extension in PGXN instead of a contrib? > > This functionality has similarity to pg_resetxlog, so we thought of putting it either in bin or in contrib. > Finally based on suggestions from other community members, we have added to contrib. Indeed. -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://github.com/guedes - http://guedesoft.net http://www.postgresql.org.br
On Sunday, January 20, 2013 10:50 AM Amit kapila wrote: On Sunday, January 20, 2013 4:04 AM Dickson S. Guedes wrote: 2013/1/18 Amit kapila <amit.kapila@huawei.com>: >>> Please find the rebased Patch for Compute MAX LSN. >>The function 'remove_parent_refernces' couldn't be called >>'remove_parent_references' ? > Shall fix this. Fixed in attached version. With Regards, Amit Kapila.
Attachment
Hackers, Amit posted a new version of this patch on January 23rd. But last comment on it by Tom is "not sure everyone wants this". https://commitfest.postgresql.org/action/patch_view?id=905 ... so, what's the status of this patch? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi, On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > Amit posted a new version of this patch on January 23rd. But last > comment on it by Tom is "not sure everyone wants this". > > https://commitfest.postgresql.org/action/patch_view?id=905 > ... so, what's the status of this patch? That comment was referencing a mail of mine - so perhaps I better explain: I think the usecase for this utility isn't big enough to be included in postgres since it really can only help in a very limited circumstances. And I think it's too likely to be misused for stuff it's not useable for (e.g. remastering). The only scenario I see is that somebody deleted/corrupted pg_controldata. In that scenario the tool is supposed to be used to find the biggest lsn used so far so the user then can use pg_resetxlog to set that as the wal starting point. But that can be way much easier solved by just setting the LSN to something very, very high. The database cannot be used for anything reliable afterwards anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > Hi, > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > > Amit posted a new version of this patch on January 23rd. But last > > comment on it by Tom is "not sure everyone wants this". > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > > > ... so, what's the status of this patch? > > That comment was referencing a mail of mine - so perhaps I better > explain: > > I think the usecase for this utility isn't big enough to be included in > postgres since it really can only help in a very limited > circumstances. And I think it's too likely to be misused for stuff it's > not useable for (e.g. remastering). > > The only scenario I see is that somebody deleted/corrupted > pg_controldata. In that scenario the tool is supposed to be used to > find > the biggest lsn used so far so the user then can use pg_resetxlog to > set > that as the wal starting point. > But that can be way much easier solved by just setting the LSN to > something very, very high. The database cannot be used for anything > reliable afterwards anyway. One of the main reason this was written was to make server up in case of corruption and user can take dump of some useful information if any. By setting LSN very, very high user might loose the information which he wants to take dump. So I think in such scenario's it can be quite helpful to users, but such scenarios are rare and so it might not be a utility which user will need often. With Regards, Amit Kapila.
On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > > > Amit posted a new version of this patch on January 23rd. But last > > > comment on it by Tom is "not sure everyone wants this". > > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > > > > > ... so, what's the status of this patch? > > > > That comment was referencing a mail of mine - so perhaps I better > > explain: > > > > I think the usecase for this utility isn't big enough to be included in > > postgres since it really can only help in a very limited > > circumstances. And I think it's too likely to be misused for stuff it's > > not useable for (e.g. remastering). > > > > The only scenario I see is that somebody deleted/corrupted > > pg_controldata. In that scenario the tool is supposed to be used to > > find > > the biggest lsn used so far so the user then can use pg_resetxlog to > > set > > that as the wal starting point. > > But that can be way much easier solved by just setting the LSN to > > something very, very high. The database cannot be used for anything > > reliable afterwards anyway. > > One of the main reason this was written was to make server up in case of > corruption and > user can take dump of some useful information if any. > > By setting LSN very, very high user might loose the information which he > wants to take dump. Which information would that loose? We don't currently use the LSN for tuple visibility. And you sure wouldn't do anything but dump such a cluster. Now you could argue that you could modify this to find the current xid used - but that's not that easy due to the wraparound semantics of xids. And you are more likely to be successfull by looking at pg_clog. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, June 26, 2013 1:20 PM Andres Freund wrote: > On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: > > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > > > > Amit posted a new version of this patch on January 23rd. But > last > > > > comment on it by Tom is "not sure everyone wants this". > > > > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > > > > > > > ... so, what's the status of this patch? > > > > > > That comment was referencing a mail of mine - so perhaps I better > > > explain: > > > > > > I think the usecase for this utility isn't big enough to be > included in > > > postgres since it really can only help in a very limited > > > circumstances. And I think it's too likely to be misused for stuff > it's > > > not useable for (e.g. remastering). > > > > > > The only scenario I see is that somebody deleted/corrupted > > > pg_controldata. In that scenario the tool is supposed to be used to > > > find > > > the biggest lsn used so far so the user then can use pg_resetxlog > to > > > set > > > that as the wal starting point. > > > But that can be way much easier solved by just setting the LSN to > > > something very, very high. The database cannot be used for anything > > > reliable afterwards anyway. > > > > One of the main reason this was written was to make server up in case > of > > corruption and > > user can take dump of some useful information if any. > > > > By setting LSN very, very high user might loose the information which > he > > wants to take dump. > > Which information would that loose? Information from WAL replay which can be more appropriate by selecting LSN. Also for a developer, guessing very high LSN might be easy, but for users it might not be equally easy, and gettingsuch value by utility would be comfortable. One more use case for which this utility was done is as below: It will be used to decide that on new-standby (old-master)whether a full backup is needed from New-master(old-standby). The backup is required when the data page in old-master precedes the lastapplied LSN in old-standby (i.e., new-master) at the moment of the failover. > We don't currently use the LSN for > tuple visibility. And you sure wouldn't do anything but dump such a > cluster. > Now you could argue that you could modify this to find the current xid > used - but that's not that easy due to the wraparound semantics of > xids. And you are more likely to be successfull by looking at pg_clog. With Regards, Amit Kapila.
Hi Amit, On 2013-06-26 16:22:28 +0530, Amit Kapila wrote: > On Wednesday, June 26, 2013 1:20 PM Andres Freund wrote: > > On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: > > > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > > > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > > > > > Amit posted a new version of this patch on January 23rd. But > > last > > > > > comment on it by Tom is "not sure everyone wants this". > > > > > > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > > > > > > > > > ... so, what's the status of this patch? > > > > > > > > That comment was referencing a mail of mine - so perhaps I better > > > > explain: > > > > > > > > I think the usecase for this utility isn't big enough to be > > included in > > > > postgres since it really can only help in a very limited > > > > circumstances. And I think it's too likely to be misused for stuff > > it's > > > > not useable for (e.g. remastering). > > > > > > > > The only scenario I see is that somebody deleted/corrupted > > > > pg_controldata. In that scenario the tool is supposed to be used to > > > > find > > > > the biggest lsn used so far so the user then can use pg_resetxlog > > to > > > > set > > > > that as the wal starting point. > > > > But that can be way much easier solved by just setting the LSN to > > > > something very, very high. The database cannot be used for anything > > > > reliable afterwards anyway. > > > > > > One of the main reason this was written was to make server up in case > > of > > > corruption and > > > user can take dump of some useful information if any. > > > > > > By setting LSN very, very high user might loose the information which > > he > > > wants to take dump. > > > > Which information would that loose? > Information from WAL replay which can be more appropriate by selecting > LSN. Sorry, I can't follow. If wal replay still is an option you can just look at the WAL and get a sensible value way easier. The whole tool seems to only make sense if you've lost pg_xlog. > Also for a developer, guessing very high LSN might be easy, but for users > it might not be equally easy, and getting such value by utility would be > comfortable. Well, then we can just document some very high lsn and be done with it. Like CF000000/00000000. That would leave enough space for eventual writes caused while dumping the database (say hint bit writes in a checksummed database) and cannot yet be realistically be reached during normal operation. > One more use case for which this utility was done is as below: > It will be used to decide that on new-standby (old-master) whether a full > backup is needed from > New-master(old-standby). > The backup is required when the data page in old-master precedes > the last applied LSN in old-standby (i.e., new-master) at the moment > of the failover. That's exactly what I was afraid of. Unless I miss something the tool is *NOT* sufficient to do this. Look at the mail introducing pg_rewind and the ensuing discussion for what's necessary for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, June 26, 2013 4:40 PM Andres Freund wrote: > Hi Amit, > > On 2013-06-26 16:22:28 +0530, Amit Kapila wrote: > > On Wednesday, June 26, 2013 1:20 PM Andres Freund wrote: > > > On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: > > > > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > > > > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > > > > > > Amit posted a new version of this patch on January 23rd. But > > > last > > > > > > comment on it by Tom is "not sure everyone wants this". > > > > > > > > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > > > > > > > > > > > ... so, what's the status of this patch? > > > > > > > > > > That comment was referencing a mail of mine - so perhaps I > better > > > > > explain: > > > > > > > > > > I think the usecase for this utility isn't big enough to be > > > included in > > > > > postgres since it really can only help in a very limited > > > > > circumstances. And I think it's too likely to be misused for > stuff > > > it's > > > > > not useable for (e.g. remastering). > > > > > > > > > > The only scenario I see is that somebody deleted/corrupted > > > > > pg_controldata. In that scenario the tool is supposed to be > used to > > > > > find > > > > > the biggest lsn used so far so the user then can use > pg_resetxlog > > > to > > > > > set > > > > > that as the wal starting point. > > > > > But that can be way much easier solved by just setting the LSN > to > > > > > something very, very high. The database cannot be used for > anything > > > > > reliable afterwards anyway. > > > > > > > > One of the main reason this was written was to make server up in > case > > > of > > > > corruption and > > > > user can take dump of some useful information if any. > > > > > > > > By setting LSN very, very high user might loose the information > which > > > he > > > > wants to take dump. > > > > > > Which information would that loose? > > Information from WAL replay which can be more appropriate by > selecting > > LSN. > > Sorry, I can't follow. If wal replay still is an option you can just > look at the WAL and get a sensible value way easier. Originally 2 parts were proposed, one was to get LSN from data pages and other from data pages. Original proposal is: http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382851FFA 1@szxeml509-mbs The second part for looking into WAL was written but due to xlogreader patch, it was postponed and I didn't get time after that to pursue it. >The whole tool > seems to only make sense if you've lost pg_xlog. The tool's initial intent was if pg_controldata is lost and this idea is originated in below mail chain: http://www.postgresql.org/message-id/4274.1340084598@sss.pgh.pa.us > > Also for a developer, guessing very high LSN might be easy, but for > users > > it might not be equally easy, and getting such value by utility > would be > > comfortable. > > Well, then we can just document some very high lsn and be done with > it. Like CF000000/00000000. > That would leave enough space for eventual writes caused while dumping > the database (say hint bit writes in a checksummed database) and cannot > yet be realistically be reached during normal operation. Can we be ultra sure, that this LSN is not reached. I think it will take vary long to reach such LSN, but still theoretically it can be possible. I don't have any evidence. > > One more use case for which this utility was done is as below: > > It will be used to decide that on new-standby (old-master) whether > a full > > backup is needed from > > New-master(old-standby). > > The backup is required when the data page in old-master precedes > > the last applied LSN in old-standby (i.e., new-master) at the > moment > > of the failover. > > That's exactly what I was afraid of. Unless I miss something the tool > is > *NOT* sufficient to do this. You mean to say if user knows the max LSN of data pages in old-master and last applied LSN in new master, he cannot decide whether Full backup is needed? It should be straightforward decision that skip a backup if that old-master LSN is less than the new-master (i.e., last applied LSN, IOW, timeline switch LSN). It was proposed as a usecase in this below mail: http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh-gcbYxMOFBYVk Kh4jzji-FCfg@mail.gmail.com > Look at the mail introducing pg_rewind and > the ensuing discussion for what's necessary for that. I had briefly looked into that discussion at the time it was going on, but I will look into it more carefully. With Regards, Amit Kapila.
On Wed, Jun 26, 2013 at 8:57 PM, Amit Kapila <amit.kapila@huawei.com> wrote: > On Wednesday, June 26, 2013 4:40 PM Andres Freund wrote: >> Hi Amit, >> >> On 2013-06-26 16:22:28 +0530, Amit Kapila wrote: >> > On Wednesday, June 26, 2013 1:20 PM Andres Freund wrote: >> > > On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: >> > > > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: >> > > > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: >> > > > > > Amit posted a new version of this patch on January 23rd. But >> > > last >> > > > > > comment on it by Tom is "not sure everyone wants this". >> > > > > > >> > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 >> > > > > >> > > > > > ... so, what's the status of this patch? >> > > > > >> > > > > That comment was referencing a mail of mine - so perhaps I >> better >> > > > > explain: >> > > > > >> > > > > I think the usecase for this utility isn't big enough to be >> > > included in >> > > > > postgres since it really can only help in a very limited >> > > > > circumstances. And I think it's too likely to be misused for >> stuff >> > > it's >> > > > > not useable for (e.g. remastering). >> > > > > >> > > > > The only scenario I see is that somebody deleted/corrupted >> > > > > pg_controldata. In that scenario the tool is supposed to be >> used to >> > > > > find >> > > > > the biggest lsn used so far so the user then can use >> pg_resetxlog >> > > to >> > > > > set >> > > > > that as the wal starting point. >> > > > > But that can be way much easier solved by just setting the LSN >> to >> > > > > something very, very high. The database cannot be used for >> anything >> > > > > reliable afterwards anyway. >> > > > >> > > > One of the main reason this was written was to make server up in >> case >> > > of >> > > > corruption and >> > > > user can take dump of some useful information if any. >> > > > >> > > > By setting LSN very, very high user might loose the information >> which >> > > he >> > > > wants to take dump. >> > > >> > > Which information would that loose? >> > Information from WAL replay which can be more appropriate by >> selecting >> > LSN. >> >> Sorry, I can't follow. If wal replay still is an option you can just >> look at the WAL and get a sensible value way easier. > > Originally 2 parts were proposed, one was to get LSN from data pages and > other from data pages. > Original proposal is: > http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382851FFA > 1@szxeml509-mbs > > The second part for looking into WAL was written but due to xlogreader > patch, it was postponed and I didn't get time after that > to pursue it. > > >>The whole tool >> seems to only make sense if you've lost pg_xlog. > > The tool's initial intent was if pg_controldata is lost and this idea is > originated in below mail chain: > http://www.postgresql.org/message-id/4274.1340084598@sss.pgh.pa.us > > >> > Also for a developer, guessing very high LSN might be easy, but for >> users >> > it might not be equally easy, and getting such value by utility >> would be >> > comfortable. >> >> Well, then we can just document some very high lsn and be done with >> it. Like CF000000/00000000. >> That would leave enough space for eventual writes caused while dumping >> the database (say hint bit writes in a checksummed database) and cannot >> yet be realistically be reached during normal operation. > > Can we be ultra sure, that this LSN is not reached. I think it will take > vary long to reach such LSN, but still theoretically it can be possible. > I don't have any evidence. > >> > One more use case for which this utility was done is as below: >> > It will be used to decide that on new-standby (old-master) whether >> a full >> > backup is needed from >> > New-master(old-standby). >> > The backup is required when the data page in old-master precedes >> > the last applied LSN in old-standby (i.e., new-master) at the >> moment >> > of the failover. >> >> That's exactly what I was afraid of. Unless I miss something the tool >> is >> *NOT* sufficient to do this. > > You mean to say if user knows the max LSN of data pages in old-master and > last applied LSN in new master, he cannot decide whether > Full backup is needed? It should be straightforward decision that skip a > backup if that old-master LSN is less than the new-master (i.e., last > applied LSN, IOW, timeline switch LSN). > It was proposed as a usecase in this below mail: > http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh-gcbYxMOFBYVk > Kh4jzji-FCfg@mail.gmail.com I guess he meant the commit hint bit problem. Regards, -- Fujii Masao
On Wednesday, June 26, 2013 10:19 PM Fujii Masao wrote: > On Wed, Jun 26, 2013 at 8:57 PM, Amit Kapila <amit.kapila@huawei.com> > wrote: > > On Wednesday, June 26, 2013 4:40 PM Andres Freund wrote: > >> Hi Amit, > >> > >> On 2013-06-26 16:22:28 +0530, Amit Kapila wrote: > >> > On Wednesday, June 26, 2013 1:20 PM Andres Freund wrote: > >> > > On 2013-06-26 08:50:27 +0530, Amit Kapila wrote: > >> > > > On Tuesday, June 25, 2013 11:12 PM Andres Freund wrote: > >> > > > > On 2013-06-16 17:19:49 -0700, Josh Berkus wrote: > >> > > > > > Amit posted a new version of this patch on January 23rd. > But > >> > > last > >> > > > > > comment on it by Tom is "not sure everyone wants this". > >> > > > > > > >> > > > > > https://commitfest.postgresql.org/action/patch_view?id=905 > >> > > > > > >> > > > > > ... so, what's the status of this patch? > >> > > > > > >> > > > > That comment was referencing a mail of mine - so perhaps I > >> better > >> > > > > explain: > >> > > > > > >> > > > > I think the usecase for this utility isn't big enough to be > >> > > included in > >> > > > > postgres since it really can only help in a very limited > >> > > > > circumstances. And I think it's too likely to be misused for > >> stuff > >> > > it's > >> > > > > not useable for (e.g. remastering). > >> > > > > > >> > > > > The only scenario I see is that somebody deleted/corrupted > >> > > > > pg_controldata. In that scenario the tool is supposed to be > >> used to > >> > > > > find > >> > > > > the biggest lsn used so far so the user then can use > >> pg_resetxlog > >> > > to > >> > > > > set > >> > > > > that as the wal starting point. > >> > > > > But that can be way much easier solved by just setting the > LSN > >> to > >> > > > > something very, very high. The database cannot be used for > >> anything > >> > > > > reliable afterwards anyway. > >> > > > > >> > > > One of the main reason this was written was to make server up > in > >> case > >> > > of > >> > > > corruption and > >> > > > user can take dump of some useful information if any. > >> > > > > >> > > > By setting LSN very, very high user might loose the > information > >> which > >> > > he > >> > > > wants to take dump. > >> > > > >> > > Which information would that loose? > >> > Information from WAL replay which can be more appropriate by > >> selecting > >> > LSN. > >> > >> Sorry, I can't follow. If wal replay still is an option you can just > >> look at the WAL and get a sensible value way easier. > > > > Originally 2 parts were proposed, one was to get LSN from data pages > and > > other from data pages. > > Original proposal is: > > http://www.postgresql.org/message- > id/6C0B27F7206C9E4CA54AE035729E9C382851FFA > > 1@szxeml509-mbs > > > > The second part for looking into WAL was written but due to > xlogreader > > patch, it was postponed and I didn't get time after that > > to pursue it. > > > > > >>The whole tool > >> seems to only make sense if you've lost pg_xlog. > > > > The tool's initial intent was if pg_controldata is lost and this idea > is > > originated in below mail chain: > > http://www.postgresql.org/message-id/4274.1340084598@sss.pgh.pa.us > > > > > >> > Also for a developer, guessing very high LSN might be easy, but > for > >> users > >> > it might not be equally easy, and getting such value by utility > >> would be > >> > comfortable. > >> > >> Well, then we can just document some very high lsn and be done with > >> it. Like CF000000/00000000. > >> That would leave enough space for eventual writes caused while > dumping > >> the database (say hint bit writes in a checksummed database) and > cannot > >> yet be realistically be reached during normal operation. > > > > Can we be ultra sure, that this LSN is not reached. I think it will > take > > vary long to reach such LSN, but still theoretically it can be > possible. > > I don't have any evidence. > > > >> > One more use case for which this utility was done is as below: > >> > It will be used to decide that on new-standby (old-master) > whether > >> a full > >> > backup is needed from > >> > New-master(old-standby). > >> > The backup is required when the data page in old-master precedes > >> > the last applied LSN in old-standby (i.e., new-master) at the > >> moment > >> > of the failover. > >> > >> That's exactly what I was afraid of. Unless I miss something the > tool > >> is > >> *NOT* sufficient to do this. > > > > You mean to say if user knows the max LSN of data pages in old-master > and > > last applied LSN in new master, he cannot decide whether > > Full backup is needed? It should be straightforward decision that > skip a > > backup if that old-master LSN is less than the new-master (i.e., last > > applied LSN, IOW, timeline switch LSN). > > It was proposed as a usecase in this below mail: > > http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh- > gcbYxMOFBYVk > > Kh4jzji-FCfg@mail.gmail.com > > I guess he meant the commit hint bit problem. True, after reading the thread mentioned by Andres, I got the reason he was pointing why it is not sufficient. So can it be useful if database has checksums enabled? With Regards, Amit Kapila.
On 2013-06-27 11:16:25 +0530, Amit Kapila wrote: > On Wednesday, June 26, 2013 10:19 PM Fujii Masao wrote: > > On Wed, Jun 26, 2013 at 8:57 PM, Amit Kapila <amit.kapila@huawei.com> > > >> > One more use case for which this utility was done is as below: > > >> > It will be used to decide that on new-standby (old-master) > > whether > > >> a full > > >> > backup is needed from > > >> > New-master(old-standby). > > >> > The backup is required when the data page in old-master precedes > > >> > the last applied LSN in old-standby (i.e., new-master) at the > > >> moment > > >> > of the failover. > > >> > > >> That's exactly what I was afraid of. Unless I miss something the > > tool > > >> is > > >> *NOT* sufficient to do this. > > > > > > You mean to say if user knows the max LSN of data pages in old-master > > and > > > last applied LSN in new master, he cannot decide whether > > > Full backup is needed? It should be straightforward decision that > > skip a > > > backup if that old-master LSN is less than the new-master (i.e., last > > > applied LSN, IOW, timeline switch LSN). > > > It was proposed as a usecase in this below mail: > > > http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh- > > gcbYxMOFBYVk > > > Kh4jzji-FCfg@mail.gmail.com > > > > I guess he meant the commit hint bit problem. > > True, after reading the thread mentioned by Andres, I got the reason he was > pointing why it is not sufficient. > So can it be useful if database has checksums enabled? I think for that usecase its far more useful to work on getting pg_rewind since that has a chance of working when local WAL has been applied that hadn't yet shipped to the other side (which is frequently the case). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thursday, June 27, 2013 11:26 AM Andres Freund wrote: > On 2013-06-27 11:16:25 +0530, Amit Kapila wrote: > > On Wednesday, June 26, 2013 10:19 PM Fujii Masao wrote: > > > On Wed, Jun 26, 2013 at 8:57 PM, Amit Kapila > <amit.kapila@huawei.com> > > > >> > One more use case for which this utility was done is as > below: > > > >> > It will be used to decide that on new-standby (old-master) > > > whether > > > >> a full > > > >> > backup is needed from > > > >> > New-master(old-standby). > > > >> > The backup is required when the data page in old-master > precedes > > > >> > the last applied LSN in old-standby (i.e., new-master) at > the > > > >> moment > > > >> > of the failover. > > > >> > > > >> That's exactly what I was afraid of. Unless I miss something the > > > tool > > > >> is > > > >> *NOT* sufficient to do this. > > > > > > > > You mean to say if user knows the max LSN of data pages in old- > master > > > and > > > > last applied LSN in new master, he cannot decide whether > > > > Full backup is needed? It should be straightforward decision that > > > skip a > > > > backup if that old-master LSN is less than the new-master (i.e., > last > > > > applied LSN, IOW, timeline switch LSN). > > > > It was proposed as a usecase in this below mail: > > > > http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh- > > > gcbYxMOFBYVk > > > > Kh4jzji-FCfg@mail.gmail.com > > > > > > I guess he meant the commit hint bit problem. > > > > True, after reading the thread mentioned by Andres, I got the reason > he was > > pointing why it is not sufficient. > > So can it be useful if database has checksums enabled? > > I think for that usecase its far more useful to work on getting > pg_rewind since that has a chance of working when local WAL has been > applied that hadn't yet shipped to the other side (which is frequently > the case). Aren't the use case for both is bit different Pg_computmaxlsn - by computing max lsn for checksums enabled database, user can made old-master follow new-master if maxlsn suggests that fullbackup is not required. Pg_rewind - a tool to resynchronize old-master and new-master by copying changed blocks from new master. I think more work might be needed in case DDL's happened on old-master after forking of new-master. Although for this case, both have resemblance in terms of avoiding full backup, but I think maxlsn tool can be independently also used. Do you think pg_rewind can be used by default for any checksum enabled database to resynchronize old-master? With Regards, Amit Kapila.
On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I think the usecase for this utility isn't big enough to be included in > postgres since it really can only help in a very limited > circumstances. And I think it's too likely to be misused for stuff it's > not useable for (e.g. remastering). > > The only scenario I see is that somebody deleted/corrupted > pg_controldata. In that scenario the tool is supposed to be used to find > the biggest lsn used so far so the user then can use pg_resetxlog to set > that as the wal starting point. > But that can be way much easier solved by just setting the LSN to > something very, very high. The database cannot be used for anything > reliable afterwards anyway. I guess this is true, but I think I'm mildly in favor of including this anyway. I think I would have used it once or twice, if it had been there - maybe not even to feed into pg_resetxlog, but just to check for future LSNs. We don't have anything like a suite of diagnostic tools in bin or contrib today, for use by database professionals in fixing things that fall strictly in the realm of "don't try this at home", and I think we should have such a thing. Unfortunately this covers about 0.1% of the space I'd like to see covered, which might be a reason to reject it or at least insist that it be enhanced first. At any rate, I don't think this is anywhere near committable as it stands today. Some random review comments: remove_parent_refernces() is spelled wrong. Why does this patch need all of this fancy path-handling logic - specifically, remove_parent_refernces() and make_absolute_path()? Surely its needs are not that different from pg_ctl or pg_resetxlog or pg_controldata. If they all need it and it's duplicated, we should fix that. Otherwise, why the special logic here? I don't think we need getLinkPath() either. The server has no trouble finding its files by just using a pathname that follows the symlink. Why do we need anything different here? The whole point of symlinks is that you can traverse them transparently, without knowing where they point. Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover, since the point of this is to be able to use it on a damaged cluster, why is that logic even desirable? I think we'd be better off assuming the pages to be valid. The calling convention for this utility seems quite baroque. There's no real need for all of this -p/-P stuff. I think the syntax should just be: pg_computemaxlsn file_or_directory... For each argument, we determine whether it's a file or a directory. If it's a file, we assume it's a PostgreSQL data file and scan it. If it's a directory, we check whether it looks like a data directory. If it does, we recurse through the whole tree structure and find the data files, and process them. If it doesn't look like a data directory, we scan each plain file in that directory whose name looks like a PostgreSQL data file name. With this approach, there's no need to limit ourselves to a single input argument and no need to specify what kind of argument it is; the tool just figures it out. I think it would be a good idea to have a mode that prints out the max LSN found in *each data file* scanned, and then prints out the overall max found at the end. In fact, I think that should perhaps be the default mode, with -q, --quiet to disable it. When printing out the per-file data, I think it would be useful to track and print the block number where the highest LSN in that file was found. I have definitely had cases where I suspected, but was not certain of, corruption. One could use a tool like this to hunt for problems, and then use something like pg_filedump to dump the offending blocks. That would be a lot easier than running pg_filedump on the whole file and grepping through the output. Similarly, I see no reason for the restriction imposed by check_path_belongs_to_pgdata(). I've had people mail me one data file; why shouldn't I be able to run this tool on it? It's a read-only utility. - if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not PostgreSQL style. + printf(_("%s compute the maximum LSN in PostgreSQL data pages.\n\n"), progname); s/compute/computes/ + /* + * Don't allow pg_computemaxlsn to be run as root, to avoid overwriting + * the ownership of files in the data directory. We need only check for + * root -- any other user won't have sufficient permissions to modify + * files in the data directory. + */ + #ifndef WIN32 + if (geteuid() == 0) + { + fprintf(stderr, _("%s: cannot be executed by \"root\".\n"), + progname); + fprintf(stderr, _("You must run %s as the PostgreSQL superuser.\n"), + progname); + exit(1); + } + #endif This utility only reads files; it does not modify them. So this seems unnecessary. I assume it's blindly copied from somewhere else. + fprintf(stderr, _("%s: \"%s\" not a valid data directory.\n"), Extra space. + /* + * Check for a postmaster lock file --- if there is one, refuse to + * proceed, on grounds we might be interfering with a live installation. + */ + snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir); Again, this might be appropriate for pg_resetxlog, but I see no reason for the restriction here. The output might be inaccurate in that case, but if you're using this tool you're required to know what you're doing. + For safety reasons, you must specify the data directory on the command line. + <command>pg_computemaxlsn</command> does not use the environment variable + <envar>PGDATA</>. Same thing here. I think using PGDATA would be quite appropriate for this utility. + <para> + This command must not be used when the server is + running. <command>pg_computemaxlsn</command> will refuse to start up if + it finds a server lock file in the data directory. If the + server crashed then a lock file might have been left + behind; in that case you can remove the lock file to allow + <command>pg_computemaxlsn</command> to run. But before you do + so, make doubly certain that there is no server process still alive. + </para> More of the same paranoia. Overall my feeling is that this can be simplified quite a lot. There's a lot of stuff in here that's really not needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, July 03, 2013 1:26 AM Robert Haas wrote: > On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com> > wrote: > > I think the usecase for this utility isn't big enough to be included > in > > postgres since it really can only help in a very limited > > circumstances. And I think it's too likely to be misused for stuff > it's > > not useable for (e.g. remastering). > > > > The only scenario I see is that somebody deleted/corrupted > > pg_controldata. In that scenario the tool is supposed to be used to > find > > the biggest lsn used so far so the user then can use pg_resetxlog to > set > > that as the wal starting point. > > But that can be way much easier solved by just setting the LSN to > > something very, very high. The database cannot be used for anything > > reliable afterwards anyway. > > I guess this is true, but I think I'm mildly in favor of including > this anyway. I think I would have used it once or twice, if it had > been there - maybe not even to feed into pg_resetxlog, but just to > check for future LSNs. We don't have anything like a suite of > diagnostic tools in bin or contrib today, for use by database > professionals in fixing things that fall strictly in the realm of > "don't try this at home", and I think we should have such a thing. > Unfortunately this covers about 0.1% of the space I'd like to see > covered, which might be a reason to reject it or at least insist that > it be enhanced first. > > At any rate, I don't think this is anywhere near committable as it > stands today. Some random review comments: > > remove_parent_refernces() is spelled wrong. It was corrected in last posted version. http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C383BEB928 8@szxeml509-mbx > Why does this patch need all of this fancy path-handling logic - > specifically, remove_parent_refernces() and make_absolute_path()? > Surely its needs are not that different from pg_ctl or pg_resetxlog or > pg_controldata. If they all need it and it's duplicated, we should > fix that. Otherwise, why the special logic here? > > I don't think we need getLinkPath() either. The server has no trouble > finding its files by just using a pathname that follows the symlink. > Why do we need anything different here? The whole point of symlinks > is that you can traverse them transparently, without knowing where > they point. It is to handle negative scenario's like if there is any recursion in path. However if you feel this is not important, it can be removed. > Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover, > since the point of this is to be able to use it on a damaged cluster, > why is that logic even desirable? I think we'd be better off assuming > the pages to be valid. > > The calling convention for this utility seems quite baroque. There's > no real need for all of this -p/-P stuff. I think the syntax should > just be: > > pg_computemaxlsn file_or_directory... > > For each argument, we determine whether it's a file or a directory. > If it's a file, we assume it's a PostgreSQL data file and scan it. If > it's a directory, we check whether it looks like a data directory. If > it does, we recurse through the whole tree structure and find the data > files, and process them. If it doesn't look like a data directory, we > scan each plain file in that directory whose name looks like a > PostgreSQL data file name. With this approach, there's no need to > limit ourselves to a single input argument and no need to specify what > kind of argument it is; the tool just figures it out. > > I think it would be a good idea to have a mode that prints out the max > LSN found in *each data file* scanned, and then prints out the overall > max found at the end. In fact, I think that should perhaps be the > default mode, with -q, --quiet to disable it. Printing too many LSN for each file might fill user's screen and he might be needing only overall LSN. Should we keep -p --printall as option to print all LSN's and keep default as overall max LSN? With Regards, Amit Kapila.
On Wed, Jul 3, 2013 at 8:44 AM, Amit Kapila <amit.kapila@huawei.com> wrote: >> Why does this patch need all of this fancy path-handling logic - >> specifically, remove_parent_refernces() and make_absolute_path()? >> Surely its needs are not that different from pg_ctl or pg_resetxlog or >> pg_controldata. If they all need it and it's duplicated, we should >> fix that. Otherwise, why the special logic here? >> >> I don't think we need getLinkPath() either. The server has no trouble >> finding its files by just using a pathname that follows the symlink. >> Why do we need anything different here? The whole point of symlinks >> is that you can traverse them transparently, without knowing where >> they point. > > It is to handle negative scenario's like if there is any recursion in path. > However if you feel this is not important, it can be removed. I'm having a hard time imagining a situation where that would be a problem. If the symlink points to itself somehow, the OS will throw an error. If your filesystem is so badly hosed that the directory structure is more fundamentally broken than the OS's circular-symlink detection code can handle, whether or not this utility works is a second-order consideration. What kind of scenario are you imagining? >> I think it would be a good idea to have a mode that prints out the max >> LSN found in *each data file* scanned, and then prints out the overall >> max found at the end. In fact, I think that should perhaps be the >> default mode, with -q, --quiet to disable it. > > Printing too many LSN for each file might fill user's screen and he might be > needing only overall LSN. > Should we keep -p --printall as option to print all LSN's and keep default > as overall max LSN? Honestly, I have a hard time imagining the use case for that mode. This isn't a tool that people should be running regularly, and some output that lends a bit of confidence that the tool is doing the right thing seems like a good thing. Keep in mind it's likely to run for quite a while, too, and this would provide a progress indicator. I'll defer to whatever the consensus is here but my gut feeling is that if you don't want that extra output, there's a good chance you're misusing the tool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: Robert Haas [mailto:robertmhaas@gmail.com] > Sent: Wednesday, July 03, 2013 6:40 PM > To: Amit Kapila > Cc: Andres Freund; Josh Berkus; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Review: Patch to compute Max LSN of Data Pages > > On Wed, Jul 3, 2013 at 8:44 AM, Amit Kapila <amit.kapila@huawei.com> > wrote: > >> Why does this patch need all of this fancy path-handling logic - > >> specifically, remove_parent_refernces() and make_absolute_path()? > >> Surely its needs are not that different from pg_ctl or pg_resetxlog > or > >> pg_controldata. If they all need it and it's duplicated, we should > >> fix that. Otherwise, why the special logic here? > >> > >> I don't think we need getLinkPath() either. The server has no > trouble > >> finding its files by just using a pathname that follows the symlink. > >> Why do we need anything different here? The whole point of symlinks > >> is that you can traverse them transparently, without knowing where > >> they point. > > > > It is to handle negative scenario's like if there is any recursion in > path. > > However if you feel this is not important, it can be removed. > > I'm having a hard time imagining a situation where that would be a > problem. If the symlink points to itself somehow, the OS will throw > an error. If your filesystem is so badly hosed that the directory > structure is more fundamentally broken than the OS's circular-symlink > detection code can handle, whether or not this utility works is a > second-order consideration. What kind of scenario are you imagining? amit@linux:~> md test amit@linux:~> cd test amit@linux:~/test> ln -s ~/test link_test amit@linux:~/test> ls link_test amit@linux:~/test> cd link_test amit@linux:~/test/link_test> ls link_test amit@linux:~/test/link_test> cd link_test amit@linux:~/test/link_test/link_test> cd link_test amit@linux:~/test/link_test/link_test/link_test> pwd /home/amit/test/link_test/link_test/link_test amit@linux:~/test/link_test/link_test/link_test> Platform details ---------------- Suse - 11.2 Kernel - 3.0.13 This is to avoid when user has given some path where db files are present. > >> I think it would be a good idea to have a mode that prints out the > max > >> LSN found in *each data file* scanned, and then prints out the > overall > >> max found at the end. In fact, I think that should perhaps be the > >> default mode, with -q, --quiet to disable it. > > > > Printing too many LSN for each file might fill user's screen and he > might be > > needing only overall LSN. > > Should we keep -p --printall as option to print all LSN's and keep > default > > as overall max LSN? > > Honestly, I have a hard time imagining the use case for that mode. > This isn't a tool that people should be running regularly, and some > output that lends a bit of confidence that the tool is doing the right > thing seems like a good thing. Keep in mind it's likely to run for > quite a while, too, and this would provide a progress indicator. I'll > defer to whatever the consensus is here but my gut feeling is that if > you don't want that extra output, there's a good chance you're > misusing the tool. With Regards, Amit Kapila.
On Wed, Jul 3, 2013 at 9:51 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > amit@linux:~> md test > amit@linux:~> cd test > amit@linux:~/test> ln -s ~/test link_test > amit@linux:~/test> ls > link_test > amit@linux:~/test> cd link_test > amit@linux:~/test/link_test> ls > link_test > amit@linux:~/test/link_test> cd link_test > amit@linux:~/test/link_test/link_test> cd link_test > amit@linux:~/test/link_test/link_test/link_test> pwd > /home/amit/test/link_test/link_test/link_test > amit@linux:~/test/link_test/link_test/link_test> So what? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, July 03, 2013 7:41 PM Robert Haas wrote: > On Wed, Jul 3, 2013 at 9:51 AM, Amit Kapila <amit.kapila@huawei.com> > wrote: > > amit@linux:~> md test > > amit@linux:~> cd test > > amit@linux:~/test> ln -s ~/test link_test > > amit@linux:~/test> ls > > link_test > > amit@linux:~/test> cd link_test > > amit@linux:~/test/link_test> ls > > link_test > > amit@linux:~/test/link_test> cd link_test > > amit@linux:~/test/link_test/link_test> cd link_test > > amit@linux:~/test/link_test/link_test/link_test> pwd > > /home/amit/test/link_test/link_test/link_test > > amit@linux:~/test/link_test/link_test/link_test> > > So what? It can cause error "too many levels of symbolic links" Point was that in case of symlinks we only want to allow PG_ paths, so that such situation can never occur. However I think this is not important to handle by this utility, so we can remove such handling. With Regards, Amit Kapila.
On Wednesday, July 03, 2013 1:26 AM Robert Haas Wrote: >On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I think the usecase for this utility isn't big enough to be included in >> postgres since it really can only help in a very limited >> circumstances. And I think it's too likely to be misused for stuff it's >> not useable for (e.g. remastering). >> >> The only scenario I see is that somebody deleted/corrupted >> pg_controldata. In that scenario the tool is supposed to be used to find >> the biggest lsn used so far so the user then can use pg_resetxlog to set >> that as the wal starting point. >> But that can be way much easier solved by just setting the LSN to >> something very, very high. The database cannot be used for anything >> reliable afterwards anyway. >I guess this is true, but I think I'm mildly in favor of including >this anyway. I think I would have used it once or twice, if it had >been there - maybe not even to feed into pg_resetxlog, but just to >check for future LSNs. We don't have anything like a suite of >diagnostic tools in bin or contrib today, for use by database >professionals in fixing things that fall strictly in the realm of >"don't try this at home", and I think we should have such a thing. >Unfortunately this covers about 0.1% of the space I'd like to see >covered, which might be a reason to reject it or at least insist that >it be enhanced first. >At any rate, I don't think this is anywhere near committable as it >stands today. Some random review comments: Thanks for the detailed review. >remove_parent_refernces() is spelled wrong. >Why does this patch need all of this fancy path-handling logic - >specifically, remove_parent_refernces() and make_absolute_path()? >Surely its needs are not that different from pg_ctl or pg_resetxlog or >pg_controldata. If they all need it and it's duplicated, we should >fix that. Otherwise, why the special logic here? >I don't think we need getLinkPath() either. The server has no trouble >finding its files by just using a pathname that follows the symlink. >Why do we need anything different here? The whole point of symlinks >is that you can traverse them transparently, without knowing where >they point. Removed the special handling of path functions. >Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover, >since the point of this is to be able to use it on a damaged cluster, >why is that logic even desirable? I think we'd be better off assuming >the pages to be valid. Corrected. >The calling convention for this utility seems quite baroque. There's >no real need for all of this -p/-P stuff. I think the syntax should >just be: >pg_computemaxlsn file_or_directory... >For each argument, we determine whether it's a file or a directory. >If it's a file, we assume it's a PostgreSQL data file and scan it. If >it's a directory, we check whether it looks like a data directory. If >it does, we recurse through the whole tree structure and find the data >files, and process them. If it doesn't look like a data directory, we >scan each plain file in that directory whose name looks like a >PostgreSQL data file name. With this approach, there's no need to >limit ourselves to a single input argument and no need to specify what >kind of argument it is; the tool just figures it out. Changed to accept file or directory without of options. >I think it would be a good idea to have a mode that prints out the max >LSN found in *each data file* scanned, and then prints out the overall >max found at the end. In fact, I think that should perhaps be the >default mode, with -q, --quiet to disable it. When printing out the >per-file data, I think it would be useful to track and print the block >number where the highest LSN in that file was found. I have >definitely had cases where I suspected, but was not certain of, >corruption. One could use a tool like this to hunt for problems, and >then use something like pg_filedump to dump the offending blocks. >That would be a lot easier than running pg_filedump on the whole file >and grepping through the output. Corrected. >Similarly, I see no reason for the restriction imposed by >check_path_belongs_to_pgdata(). I've had people mail me one data >file; why shouldn't I be able to run this tool on it? It's a >read-only utility. >- if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not >PostgreSQL style. >+ printf(_("%s compute the maximum LSN in PostgreSQL data >pages.\n\n"), progname); Fixed. >+ /* >+ * Don't allow pg_computemaxlsn to be run as root, to avoid overwriting >+ * the ownership of files in the data directory. We need only check for >+ * root -- any other user won't have sufficient permissions to modify >+ * files in the data directory. >+ */ >+ #ifndef WIN32 >+ if (geteuid() == 0) >+ { >+ fprintf(stderr, _("%s: cannot be executed by \"root\".\n"), >+ progname); >+ fprintf(stderr, _("You must run %s as the PostgreSQL >superuser.\n"), >+ progname); >+ exit(1); >+ } >+ #endif >This utility only reads files; it does not modify them. So this seems >unnecessary. I assume it's blindly copied from somewhere else. >+ fprintf(stderr, _("%s: \"%s\" not a valid data directory.\n"), >Extra space. >+ /* >+ * Check for a postmaster lock file --- if there is one, refuse to >+ * proceed, on grounds we might be interfering with a live installation. >+ */ >+ snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir); >Again, this might be appropriate for pg_resetxlog, but I see no reason >for the restriction here. The output might be inaccurate in that >case, but if you're using this tool you're required to know what >you're doing. Fixed. >+ For safety reasons, you must specify the data directory on the >command line. >+ <command>pg_computemaxlsn</command> does not use the environment variable >+ <envar>PGDATA</>. >Same thing here. I think using PGDATA would be quite appropriate for >this utility. Fixed. >+ <para> >+ This command must not be used when the server is >+ running. <command>pg_computemaxlsn</command> will refuse to start up if >+ it finds a server lock file in the data directory. If the >+ server crashed then a lock file might have been left >+ behind; in that case you can remove the lock file to allow >+ <command>pg_computemaxlsn</command> to run. But before you do >+ so, make doubly certain that there is no server process still alive. >+ </para> >More of the same paranoia. >Overall my feeling is that this can be simplified quite a lot. >There's a lot of stuff in here that's really not needed. Corrected. Please find the updated patch attached. Regards, Hari babu.
On Thu, Jul 4, 2013 at 2:14 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > It can cause error "too many levels of symbolic links" Sure, so you report the error and exit. No problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
This looks better. + fprintf(stderr, _("%s: .. file \"%s\" for seeking: %s\n"), + progname, filename, strerror(errno)); Weird error message style - what's with the ".."? + fprintf(stderr, _("%s: .. file \"%s\" length is more than segment size: %d.\n"), + progname, filename, RELSEG_SIZE); Again. + fprintf(stderr, _("%s: could not seek to next page \"%s\": %s\n"), + progname, filename, strerror(errno)); I think this should be written as: could not seek to offset NUMBER in file "PATH" + fprintf(stderr, _("%s: could not read file \"%s\": %s\n"), + progname, filename, strerror(errno)); And this one as: could not read file "PATH" at offset NUMBER + printf("File:%s Maximum LSN found is: %X/%X \nWAL segment file name (fileid,seg): %X/%X\n", I think that we don't need to display the WAL segment file name for the per-file progress updates. Instead, let's show the block number where that LSN was found, like this: Highest LSN for file "%s" is %X/%X in block %u. The overall output can stay as you have it. + if (0 != result) Coding style. + static int + FindMaxLSNinFile(char *filename, XLogRecPtr *maxlsn) It seems that this function, and a few others, use -1 for a failure return, 0 for success, and all others undefined. Although this is a defensible choice, I think it would be more PG-like to make the return value bool and use true for success and false for failure. + if (S_ISREG(statbuf.st_mode)) + (void) FindMaxLSNinFile(pathbuf, maxlsn); + else + (void) FindMaxLSNinDir(pathbuf, maxlsn); I don't see why we're throwing away the return value here. I would expect the function to have a "bool result = true" at the top and sent result = false if one of these functions returns false. At the end, it returns result. + This utility can only be run by the user who installed the server, because + it requires read/write access to the data directory. False. + LsnSearchPath = argv[optind]; + + if (LsnSearchPath == NULL) + LsnSearchPath = getenv("PGDATA"); I think you should write the code so that the tool loops over its input arguments; if none, it processes $PGDATA. (Don't forget to update the syntax synopsis and documentation to match.) I think the documentation should say something about the intended uses of this tool, including cautions against using it for things to which it is not well-suited. I guess the threshold question for this patch is whether those uses are enough to justify including the tool in the core distribution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, July 04, 2013 11:19 PM Robert Haas wrote: >+ fprintf(stderr, _("%s: .. file \"%s\" for seeking: %s\n"), >+ progname, filename, strerror(errno)); >Weird error message style - what's with the ".."? >+ fprintf(stderr, _("%s: .. file \"%s\" length is more than segment >size: %d.\n"), >+ progname, filename, RELSEG_SIZE); >Again. Corrected. >+ fprintf(stderr, _("%s: could not seek to next page \"%s\": %s\n"), >+ progname, filename, strerror(errno)); >I think this should be written as: could not seek to offset NUMBER in >file "PATH" >+ fprintf(stderr, _("%s: could not read file \"%s\": %s\n"), >+ progname, filename, strerror(errno)); >And this one as: could not read file "PATH" at offset NUMBER >+ printf("File:%s Maximum LSN found is: %X/%X \nWAL segment file >name (fileid,seg): %X/%X\n", >I think that we don't need to display the WAL segment file name for >the per-file progress updates. Instead, let's show the block number >where that LSN was found, like this: >Highest LSN for file "%s" is %X/%X in block %u. >The overall output can stay as you have it. Changed as per your suggestion. >+ if (0 != result) >Coding style. >+ static int >+ FindMaxLSNinFile(char *filename, XLogRecPtr *maxlsn) >It seems that this function, and a few others, use -1 for a failure >return, 0 for success, and all others undefined. Although this is a >defensible choice, I think it would be more PG-like to make the return >value bool and use true for success and false for failure. >+ if (S_ISREG(statbuf.st_mode)) >+ (void) FindMaxLSNinFile(pathbuf, maxlsn); >+ else >+ (void) FindMaxLSNinDir(pathbuf, maxlsn); >I don't see why we're throwing away the return value here. I would >expect the function to have a "bool result = true" at the top and sent >result = false if one of these functions returns false. At the end, >it returns result. Fixed. >+ This utility can only be run by the user who installed the server, because >+ it requires read/write access to the data directory. >False. Removed. >+ LsnSearchPath = argv[optind]; >+ >+ if (LsnSearchPath == NULL) >+ LsnSearchPath = getenv("PGDATA"); >I think you should write the code so that the tool loops over its >input arguments; if none, it processes $PGDATA. (Don't forget to >update the syntax synopsis and documentation to match.) Added the functionality of multiple file or directories parsing and printing Max LSN for each input argument. >I think the documentation should say something about the intended uses >of this tool, including cautions against using it for things to which >it is not well-suited. I guess the threshold question for this patch >is whether those uses are enough to justify including the tool in the >core distribution. Added some use cases and notes regarding the tool. Please suggest if any More information needs to be documented. Thanks for the review, please find the updated patch attached in the mail. Regards, Hari babu.
On Friday, July 05, 2013 6:48 PM Hari Babu wrote: >On Thursday, July 04, 2013 11:19 PM Robert Haas wrote: The patch is updated with the following changes. 1.If the input data is data directory, all the errors occurred are displayed at once instead of one error at a time. 2.Fixed the problem of replacing result of the previous file or directory result with new one. 3.Update the docs. Thanks for the review, please find the updated patch attached in the mail. Regards, Hari babu.
On 2013-07-08 16:17:54 +0530, Hari Babu wrote: > + This utility can also be used to decide whether backup is required or not when the data page > + in old-master precedes the last applied LSN in old-standby (i.e., new-master) at the > + moment of the failover. > + </para> > + </refsect1> I don't think this is safe in any interesting set of cases. Am I missing something? People falsely thinking that it can be used for this is the primary reason for me objecting the patch... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Monday, July 08, 2013 4:26 PM Andres Freund wrote: > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: > > + This utility can also be used to decide whether backup is > required or not when the data page > > + in old-master precedes the last applied LSN in old-standby > (i.e., new-master) at the > > + moment of the failover. > > + </para> > > + </refsect1> > > I don't think this is safe in any interesting set of cases. Am I > missing > something? No, you are not missing anything. It can be only used to find max LSN in database which can avoid further corruption > People falsely thinking that it can be used for this is the primary > reason for me objecting the patch... With Regards, Amit Kapila.
On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: > On Monday, July 08, 2013 4:26 PM Andres Freund wrote: > > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: > > > + This utility can also be used to decide whether backup is > > required or not when the data page > > > + in old-master precedes the last applied LSN in old-standby > > (i.e., new-master) at the > > > + moment of the failover. > > > + </para> > > > + </refsect1> > > > > I don't think this is safe in any interesting set of cases. Am I > > missing > > something? > > No, you are not missing anything. It can be only used to find max LSN in > database which can avoid further corruption Why is the patch submitted documenting it as a use-case then? I find it rather scary if the *patch authors* document a known unsafe use case as one of the known use-cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >> > > + This utility can also be used to decide whether backup is >> > required or not when the data page >> > > + in old-master precedes the last applied LSN in old-standby >> > (i.e., new-master) at the >> > > + moment of the failover. >> > > + </para> >> > > + </refsect1> >> > >> > I don't think this is safe in any interesting set of cases. Am I >> > missing >> > something? >> >> No, you are not missing anything. It can be only used to find max LSN in >> database which can avoid further corruption >Why is the patch submitted documenting it as a use-case then? I find it >rather scary if the *patch authors* document a known unsafe use case as >one of the known use-cases. I got the problem which can occur with the specified use case. Removed the wrong use case specified above. Thanks for the review, please find the updated patch attached in the mail. Regards, Hari babu.
On Monday, July 08, 2013 5:16 PM Andres Freund wrote: > On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: > > On Monday, July 08, 2013 4:26 PM Andres Freund wrote: > > > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: > > > > + This utility can also be used to decide whether backup is > > > required or not when the data page > > > > + in old-master precedes the last applied LSN in old-standby > > > (i.e., new-master) at the > > > > + moment of the failover. > > > > + </para> > > > > + </refsect1> > > > > > > I don't think this is safe in any interesting set of cases. Am I > > > missing > > > something? > > > > No, you are not missing anything. It can be only used to find max LSN > in > > database which can avoid further corruption > > Why is the patch submitted documenting it as a use-case then? This is my mistake, I was not able to catch. I am really sorry for it and in future will make sure such mistake doesn't happen again > I find it > rather scary if the *patch authors* document a known unsafe use case as > one of the known use-cases. With Regards, Amit Kapila.
>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >>> > > + This utility can also be used to decide whether backup is >>> > required or not when the data page >>> > > + in old-master precedes the last applied LSN in old-standby >>> > (i.e., new-master) at the >>> > > + moment of the failover. >>> > > + </para> >>> > > + </refsect1> >>> > >>> > I don't think this is safe in any interesting set of cases. Am I >>> > missing >>> > something? >>> >>> No, you are not missing anything. It can be only used to find max LSN in >>> database which can avoid further corruption >>Why is the patch submitted documenting it as a use-case then? I find it >>rather scary if the *patch authors* document a known unsafe use case as >>one of the known use-cases. >I got the problem which can occur with the specified use case. Removed the >wrong use case specified above. >Thanks for the review, please find the updated patch attached in the mail. Patch is not getting compiled on Windows, I had made following changes: a. updated the patch for resolving Windows build b. few documentation changes in (pg_resetxlog.sgml) for spelling mistake and minor line change c. corrected year for Copyright in file pg_computemaxlsn.c With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >>>> > > + This utility can also be used to decide whether backup is >>>> > required or not when the data page >>>> > > + in old-master precedes the last applied LSN in old-standby >>>> > (i.e., new-master) at the >>>> > > + moment of the failover. >>>> > > + </para> >>>> > > + </refsect1> >>>> > >>>> > I don't think this is safe in any interesting set of cases. Am I >>>> > missing >>>> > something? >>>> >>>> No, you are not missing anything. It can be only used to find max LSN in >>>> database which can avoid further corruption > >>>Why is the patch submitted documenting it as a use-case then? I find it >>>rather scary if the *patch authors* document a known unsafe use case as >>>one of the known use-cases. > >>I got the problem which can occur with the specified use case. Removed the >>wrong use case specified above. >>Thanks for the review, please find the updated patch attached in the mail. > > Patch is not getting compiled on Windows, I had made following changes: > a. updated the patch for resolving Windows build > b. few documentation changes in (pg_resetxlog.sgml) for spelling > mistake and minor line change > c. corrected year for Copyright in file pg_computemaxlsn.c I am OK with this patch in its current form, modulo some grammar issues in the documentation which I can fix before committing. However, I'm unclear whether there was sufficient consensus to proceed with this. Can others weigh in? If there is too much residual unhappiness with this, then we should just mark this as Rejected and stop wasting time on it; it can be pushed to PGXN or similar even if we don't put it in core. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 9, 2013 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>>>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>>>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>>>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >>>>> > > + This utility can also be used to decide whether backup is >>>>> > required or not when the data page >>>>> > > + in old-master precedes the last applied LSN in old-standby >>>>> > (i.e., new-master) at the >>>>> > > + moment of the failover. >>>>> > > + </para> >>>>> > > + </refsect1> >>>>> > >>>>> > I don't think this is safe in any interesting set of cases. Am I >>>>> > missing >>>>> > something? >>>>> >>>>> No, you are not missing anything. It can be only used to find max LSN in >>>>> database which can avoid further corruption >> >>>>Why is the patch submitted documenting it as a use-case then? I find it >>>>rather scary if the *patch authors* document a known unsafe use case as >>>>one of the known use-cases. >> >>>I got the problem which can occur with the specified use case. Removed the >>>wrong use case specified above. >>>Thanks for the review, please find the updated patch attached in the mail. >> >> Patch is not getting compiled on Windows, I had made following changes: >> a. updated the patch for resolving Windows build >> b. few documentation changes in (pg_resetxlog.sgml) for spelling >> mistake and minor line change >> c. corrected year for Copyright in file pg_computemaxlsn.c > > I am OK with this patch in its current form, modulo some grammar > issues in the documentation which I can fix before committing. > However, I'm unclear whether there was sufficient consensus to proceed > with this. I would like to summarize in short about this patch. This idea started with me proposing some solutions for cases where user can recover from some of corruption scenario's, then Tom Lane had suggested this idea and I prepared a Patch for it, then you have given suggestions about this patch at various phases during development. Both Fujji Masao and Alvaro seems to be in favor of use case and Alvaro has given few suggestions for patch as well. Muhammad Usama had reviewed this patch. It appears to me that Andres is not in favour of this Patch. So, I think there are good number of people who had participated in this patch and were in favour of this patch. The above summarization is from what I remember about this Patch, so if anybody feels that I have misunderstood, then kindly correct me. > Can others weigh in? If there is too much residual > unhappiness with this, then we should just mark this as Rejected and > stop wasting time on it; it can be pushed to PGXN or similar even if > we don't put it in core. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 9, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>>>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>>>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>>>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >>>>> > > + This utility can also be used to decide whether backup is >>>>> > required or not when the data page >>>>> > > + in old-master precedes the last applied LSN in old-standby >>>>> > (i.e., new-master) at the >>>>> > > + moment of the failover. >>>>> > > + </para> >>>>> > > + </refsect1> >>>>> > >>>>> > I don't think this is safe in any interesting set of cases. Am I >>>>> > missing >>>>> > something? >>>>> >>>>> No, you are not missing anything. It can be only used to find max LSN in >>>>> database which can avoid further corruption >> >>>>Why is the patch submitted documenting it as a use-case then? I find it >>>>rather scary if the *patch authors* document a known unsafe use case as >>>>one of the known use-cases. >> >>>I got the problem which can occur with the specified use case. Removed the >>>wrong use case specified above. >>>Thanks for the review, please find the updated patch attached in the mail. >> >> Patch is not getting compiled on Windows, I had made following changes: >> a. updated the patch for resolving Windows build >> b. few documentation changes in (pg_resetxlog.sgml) for spelling >> mistake and minor line change >> c. corrected year for Copyright in file pg_computemaxlsn.c > > I am OK with this patch in its current form, modulo some grammar > issues in the documentation which I can fix before committing. > However, I'm unclear whether there was sufficient consensus to proceed > with this. Can others weigh in? If there is too much residual > unhappiness with this, then we should just mark this as Rejected and > stop wasting time on it; it can be pushed to PGXN or similar even if > we don't put it in core. I didn't hear any other votes. Anyone else have an opinion about this? If I can't get a +1 from anyone who wasn't involved in writing the patch, I'm inclined to think we don't have sufficient consensus to commit this. On further review of the patch, I also found a number of other issues that I think need to be fixed before we could consider committing it: - Per a previous request of mine, the patch has three different modes: it can be run on an individual file, on a directory potentially containing multiple relation files, or on an entire data directory. This is not explained in the documentation. - The patch skips printing an error if attempting to open a file returns ENOENT. I don't see why that case shouldn't print an error. Yeah, it could be legit if you're executing this against a running server, but why are you doing that? And even if you are (e.g. for corruption detection), printing a error message and proceeding makes more sense than proceeding without printing anything, at least IMHO. - Some of the static functions in this file preceed main and others follow it. And they use different naming conventions. I suggest putting all of them after main() and using names like check_data_dir, check_data_file_name (instead of validateRelfilenodename; note that the comment for that function is also incorrect), find_max_lsn_in_file, find_max_lsn_in_directory, find_max_lsn_in_pgdata. - Since the patch goes to some pains to exit with 0 if no errors were encountered and 1 if any were, that probably ought to be documented. But I think instead of passing a "result" argument back up the call stack it would be better to just keep a global variable called "errors_encountered". When we encounter an error, bump that value. Then you can just do exit(errors_encountered > 0 ? 1 : 0) and not bother passing all of this stuff around via the return value. The current behavior is inconsistent in another way, too: if you encounter an error while scanning through a particular directory, you finish the whole directory. But if multiple command-line arguments were passed, you don't proceed to any subsequent command-line arguments. I think you should continue always, which the above-mentioned change will take care of basically for free. - The description of the -q option is not very clear. A reader could be forgiven for thinking that the option suppresses all output, which would be quite useless, or at least left in doubt about what output will still be provided. A broader complaint I have with this patch is that it almost but not-quite solves a problem I've had a few times in the past: namely, searching through the data directory for data blocks which have LSNs in the future. This has come up a few times for me, and this tool would make it easier, because I'd be able to run it and look through the output to see which relations have high max-LSN values. However, it wouldn't be quite enough, because it'd only tell me about the block with the highest LSN in each file, whereas what I'd really want to find is every block with an LSN greater than some threshold value. Maybe I'm pushing the envelope too much by trying to fit that into the framework of this patch, but I can't help thinking we're not going to want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are 95% the same code, so maybe we ought to rename the utility to something slightly more generic than "pg_computemaxlsn". In some ways this is almost a plausible shell for a generic block-corruption checker. If we wanted to validate, for example, that every page in a relation (or a whole data directory) had the expected page version number, we'd want almost exactly this same code structure, just with more checks added in. I'm going to mark this "Returned with Feedback" in the CF app for now.I still think that there's promise in this idea but,on further reflection, committing what we have here now doesn't feel like the right decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > A broader complaint I have with this patch is that it almost but > not-quite solves a problem I've had a few times in the past: namely, > searching through the data directory for data blocks which have LSNs > in the future. This has come up a few times for me, and this tool > would make it easier, because I'd be able to run it and look through > the output to see which relations have high max-LSN values. However, > it wouldn't be quite enough, because it'd only tell me about the block > with the highest LSN in each file, whereas what I'd really want to > find is every block with an LSN greater than some threshold value. > Maybe I'm pushing the envelope too much by trying to fit that into the > framework of this patch, but I can't help thinking we're not going to > want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are > 95% the same code, so maybe we ought to rename the utility to > something slightly more generic than "pg_computemaxlsn". Perhaps not coincidentally, I had a need to do this recently. Perhaps we should turn the utility into a generic tool to report existing LSNs, with options to 1) report only the highest one in a given file, 2) report only those that exceed some threshold. So maybe pg_reportlsn or pg_extractlsn. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 18, 2013 at 9:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Oct 9, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Sep 14, 2013 at 3:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>On Monday, July 08, 2013 5:16 PM Andres Freund wrote: >>>>>On 2013-07-08 17:10:43 +0530, Amit Kapila wrote: >>>>>> On Monday, July 08, 2013 4:26 PM Andres Freund wrote: >>>>>> > On 2013-07-08 16:17:54 +0530, Hari Babu wrote: >> I am OK with this patch in its current form, modulo some grammar >> issues in the documentation which I can fix before committing. >> However, I'm unclear whether there was sufficient consensus to proceed >> with this. Can others weigh in? If there is too much residual >> unhappiness with this, then we should just mark this as Rejected and >> stop wasting time on it; it can be pushed to PGXN or similar even if >> we don't put it in core. > > I didn't hear any other votes. Anyone else have an opinion about > this? If I can't get a +1 from anyone who wasn't involved in writing > the patch, I'm inclined to think we don't have sufficient consensus to > commit this. > > On further review of the patch, I also found a number of other issues > that I think need to be fixed before we could consider committing it: > > - Per a previous request of mine, the patch has three different modes: > it can be run on an individual file, on a directory potentially > containing multiple relation files, or on an entire data directory. > This is not explained in the documentation. There is some explanation about it, but I think you want to see in different format or wording. I will change it to explain it more clearly in next update of this patch. + <title>Description</title> + <para> + <command>pg_computemaxlsn</command> computes maximun LSN from database pages + in the specified list of files or directories. + </para> + + <para> + If user doesn't provide the file or directory to find the max lsn then + <command>pg_computemaxlsn</command> use the environment variable <envar>PGDATA</> + if exists otherwise reports an error. + </para> > - The patch skips printing an error if attempting to open a file > returns ENOENT. I don't see why that case shouldn't print an error. > Yeah, it could be legit if you're executing this against a running > server, but why are you doing that? And even if you are (e.g. for > corruption detection), printing a error message and proceeding makes > more sense than proceeding without printing anything, at least IMHO. > > - Some of the static functions in this file preceed main and others > follow it. And they use different naming conventions. I suggest > putting all of them after main() and using names like check_data_dir, > check_data_file_name (instead of validateRelfilenodename; note that > the comment for that function is also incorrect), > find_max_lsn_in_file, find_max_lsn_in_directory, > find_max_lsn_in_pgdata. > > - Since the patch goes to some pains to exit with 0 if no errors were > encountered and 1 if any were, that probably ought to be documented. > But I think instead of passing a "result" argument back up the call > stack it would be better to just keep a global variable called > "errors_encountered". When we encounter an error, bump that value. > Then you can just do exit(errors_encountered > 0 ? 1 : 0) and not > bother passing all of this stuff around via the return value. The > current behavior is inconsistent in another way, too: if you encounter > an error while scanning through a particular directory, you finish the > whole directory. But if multiple command-line arguments were passed, > you don't proceed to any subsequent command-line arguments. I think > you should continue always, which the above-mentioned change will take > care of basically for free. > > - The description of the -q option is not very clear. A reader could > be forgiven for thinking that the option suppresses all output, which > would be quite useless, or at least left in doubt about what output > will still be provided. Thank you for your feedback. I will update it in next version of patch if there is a consensus to proceed for this utility. > A broader complaint I have with this patch is that it almost but > not-quite solves a problem I've had a few times in the past: namely, > searching through the data directory for data blocks which have LSNs > in the future. This has come up a few times for me, and this tool > would make it easier, because I'd be able to run it and look through > the output to see which relations have high max-LSN values. However, > it wouldn't be quite enough, because it'd only tell me about the block > with the highest LSN in each file, whereas what I'd really want to > find is every block with an LSN greater than some threshold value. > Maybe I'm pushing the envelope too much by trying to fit that into the > framework of this patch, but I can't help thinking we're not going to > want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are > 95% the same code, so maybe we ought to rename the utility to > something slightly more generic than "pg_computemaxlsn". > > In some ways this is almost a plausible shell for a generic > block-corruption checker. If we wanted to validate, for example, that > every page in a relation (or a whole data directory) had the expected > page version number, we'd want almost exactly this same code > structure, just with more checks added in. Okay, I have some idea to make the use case of this utility bit broader, so that it can be helpful in many more cases to detect corruption. This utility can be extended to validate database in below ways: 1. It will validate the blocks based on checksum, if database has checksums 2. Can validate the version number in pages 3. Can validate if all data files exist 4. Can validate if block contains all zeros 5. It can validate the blocks based on whether rows are intact in a block, doesn't have clear idea at this point for how to achieve it. 6. It can report empty pages in a file. more ideas can be thought of to validate the database, but I think we can start with something minimal like a. its current usage of computing LSN and extend it find LSN greater than threshold b. validate the checksum > I'm going to mark this "Returned with Feedback" in the CF app for now. > I still think that there's promise in this idea but, on further > reflection, committing what we have here now doesn't feel like the > right decision. Thank you very much for giving valuable suggestions and supporting the idea. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2013 at 9:24 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas escribió: > >> A broader complaint I have with this patch is that it almost but >> not-quite solves a problem I've had a few times in the past: namely, >> searching through the data directory for data blocks which have LSNs >> in the future. This has come up a few times for me, and this tool >> would make it easier, because I'd be able to run it and look through >> the output to see which relations have high max-LSN values. However, >> it wouldn't be quite enough, because it'd only tell me about the block >> with the highest LSN in each file, whereas what I'd really want to >> find is every block with an LSN greater than some threshold value. >> Maybe I'm pushing the envelope too much by trying to fit that into the >> framework of this patch, but I can't help thinking we're not going to >> want both pg_computemaxlsn and pg_findlsnsaftersomethreshold that are >> 95% the same code, so maybe we ought to rename the utility to >> something slightly more generic than "pg_computemaxlsn". > > Perhaps not coincidentally, I had a need to do this recently. Perhaps > we should turn the utility into a generic tool to report existing LSNs, > with options to 1) report only the highest one in a given file, 2) > report only those that exceed some threshold. So maybe pg_reportlsn or > pg_extractlsn. How about extending it validate database in more meaningful way and name it as validatedb. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com