Re: Review: Patch to compute Max LSN of Data Pages - Mailing list pgsql-hackers

From Muhammad Usama
Subject Re: Review: Patch to compute Max LSN of Data Pages
Date
Msg-id CAEJvTzV1u1FQnB8r-DOszrvvSzxemAzPZUZ74M0VWmvB019mCw@mail.gmail.com
Whole thread Raw
In response to Re: Review: Patch to compute Max LSN of Data Pages  (Amit kapila <amit.kapila@huawei.com>)
Responses Re: Review: Patch to compute Max LSN of Data Pages
List pgsql-hackers


On Tue, Nov 27, 2012 at 5:52 PM, Amit kapila <amit.kapila@huawei.com> wrote:
Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:
 
> Observations and Comments
> -------------------------------------------

> - If no option is given to pg_computemaxlsn utility then we get a wrong
> error message
> ./pg_computemaxlsn  ../data
> pg_computemaxlsn: file or directory "(null)" exists
> In my opinion we should consider the -P (Print max LSN from whole data
> directory) as default in case no option is specified, or otherwise throw a
> more descriptive error.

 

Fixed. By considering default option as Whole data directory.



> - Options -p and -P should be mutually exclusive
 
Fixed. Both options will not be allowed at the same time.



- Long options missing for -p and -P

 

Fixed. introduced long options as --path & --data-directory.



> - Help string for -P  should be something like "print max LSN from whole
> data directory" instead of  "print max LSN from whole database"
 
Fixed, by changing the message as suggested.


> - Help menu is silent about -V or --version option
 
Fixed.


> - I think when finding the max LSN of single file the utility should
> consider all relation segments.
 
IMHO, as this utility is not aware of relation or any other logical entity and deals with only file or directory,

  it is okay to find only for that file.




> - There is no check for extra arguments
> e.g
> ./pg_computemaxlsn  -P . ../data/ ../data/base/ ../data2/
 
Fixed.
        - Extra options gives error.
        - Multiple -p options also gives error.
        Eg:
        ./pg_computemaxlsn
-p base/12286/12200 -p base/12286/12201 data



> - For -p {file | dir}  option the utility expects the file path relative to
> the specified data directory path which makes thing little confusing
> for example
> ./pg_computemaxlsn -p data/base/12286/12200 data
> pg_computemaxlsn: file or directory "data/base/12286/12200" does not exists
> although data/base/12286/12200 is valid file path but we gets file does not
> exists error
As the path of file is relative to data directory, that's why in error it prints the path as "data/base/12286/12200".
Do you have any suggestion what should be done for this?
 
I think we should expect provided path to be relative to current directory or may consider it to be relative to either one of Data or CWD.
Because normally we expect path to be relative to CWD if some program is asking for path in command line. And also tab-complete doesn't works on terminal if path is not absolute or relative to the current directory.
To handle this I think we should not change current directory to the data directory and generate the file path (like for postmaster.pid) where required by appending data directory path.
And before throwing an error should check if path is valid for either DATA or CWD something like

if (lstat(LsnSearchPath, &fst) < 0)
{
if (errno == ENOENT)
{
/* Check if the path is relative to provided data directory */
snprintf(path, MAXPGPATH, "%s/%s",DataDir,LsnSearchPath);
if (lstat(path, &fst) < 0)
{
if (errno == ENOENT)
{
/* Check if the path is relative to provided data directory */
fprintf(stderr, _("%s: file or directory \"%s\" does not exists.\n"),
progname, LsnSearchPath);
}
else
{
fprintf(stderr, _("%s: could not stat file or directory \"%s\": %s\n"),
progname, path, strerror(errno));
}
exit(1);
}
else /* Path is relative to data directory*/
LsnSearchPath = strdup(path);
}
else
{
fprintf(stderr, _("%s: could not stat file or directory \"%s\": %s\n"),
progname, LsnSearchPath, strerror(errno));
exit(1);
}
}
   


> - Utility should print the max LSN number in hexadecimal notation and LSN
> format (logid/segno) for consistency with PG,
 
Fixed
 
> and may also print the file name which contains that LSN.
 
It might be confusing to users for the cases where it has to follow link (pg_tblspc) and then print some absolute path which is not related to relative path given by user.
Also I am not sure if it will be of any use to user.
 


> - When finding max LSN from directory, the utility does not considers the
> sub directories, which I think could be useful in some cases, like if we
> want to find the max LSN in tablespace directory. With the current
> implementation we would require to run the utility multiple times, once for
> each database.
 
It will consider sub-directories as well.



> Code Review
> ---------------------

> Code generally looks good, I have few small points.

> - In main() function variable maxLSN is defined as uint64,
> Although XLogRecPtr and uint64 are the same thing, but I think we should
> use XLogRecPtr instead.
Changed as per suggestion.

> - Just a small thing although will hardly improve anything but still for
> sake of saving few cycles, I think you should use XLByteLT ( < ) instead
> of XLByteLE ( <= )
> to find if the previous max LSN is lesser then current.
 
Changed as per suggestion.


> - '\n' is missing from one the printf in the code :-)
> fprintf(stderr, _("%s: file or directory \"%s\" does not exists"),
> progname, LsnSearchPath);
 
Changed as per suggestion.


> - While finding the max LSN from single file, you are not verifying that
> the file actually exists inside the data directory path provided. and can
> end up looking
>at wrong data directory for checking if the server is running.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200
> $PWD/otherinstallation/data
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
> 0000000000000000

> - Code is not verifying, if the provided data directory is the valid data
> directory, This could make pg_computemaxlsn to compute max LSN from the
> running server file.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200 NONDATADIR/
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
>0000000000000000
 
Fixed, by allowing only relative path for file or directory and check if that is relative and below data directory with function path_is_relative_and_below_cwd().


> Questions
> -----------------

> - I am wondering why are you not following the link inside the "pg_tblspc"
> directory to get to the table space location instead of building the
> directory path. I am thinking if you follow the link instead, The utility
> can be used across the different versions of PG.
Changed as per suggestion.
 
 
Apart from above, updated the documentation.
With Regards,
Amit Kapila.


With Regards
Muhammad Usama

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PQconninfo function for libpq
Next
From: Muhammad Usama
Date:
Subject: Re: Review: Patch to compute Max LSN of Data Pages