Thread: pg_ctl status with nonexistent data directory

pg_ctl status with nonexistent data directory

From
Peter Eisentraut
Date:
This doesn't seem right:

$ pg_ctl -D /nowhere status
pg_ctl: no server running

It does exit with status 3, so it's not all that broken, but I think the
error message could be more accurate.





Re: pg_ctl status with nonexistent data directory

From
Robert Haas
Date:
On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> This doesn't seem right:
>
> $ pg_ctl -D /nowhere status
> pg_ctl: no server running
>
> It does exit with status 3, so it's not all that broken, but I think the
> error message could be more accurate.

I doubt anyone will object if you feel the urge to fix it.  I won't, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Mon, Nov  4, 2013 at 11:31:30AM -0500, Robert Haas wrote:
> On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > This doesn't seem right:
> >
> > $ pg_ctl -D /nowhere status
> > pg_ctl: no server running
> >
> > It does exit with status 3, so it's not all that broken, but I think the
> > error message could be more accurate.
>
> I doubt anyone will object if you feel the urge to fix it.  I won't, anyway.

I have addressed this issue with the attached patch:

    $ pg_ctl -D /lkjasdf status
    pg_ctl: directory "/lkjasdf" does not exist
    $ pg_ctl -D / status
    pg_ctl: directory "/" is not a database cluster directory

One odd question is that pg_ctl status has this comment for reporting
the exit code for non-running servers:

    printf(_("%s: no server running\n"), progname);

    /*
     * The Linux Standard Base Core Specification 3.1 says this should return
     * '3'
     * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
     */
    exit(3);

If they haven't passed us a data directory, we don't really know if the
server is running or not, so the patch just returns '1'.


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: pg_ctl status with nonexistent data directory

From
Amit Kapila
Date:
On Thu, Mar 6, 2014 at 4:38 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Nov  4, 2013 at 11:31:30AM -0500, Robert Haas wrote:
>> On Sat, Nov 2, 2013 at 3:32 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> > This doesn't seem right:
>> >
>> > $ pg_ctl -D /nowhere status
>> > pg_ctl: no server running
>> >
>> > It does exit with status 3, so it's not all that broken, but I think the
>> > error message could be more accurate.
>>
>> I doubt anyone will object if you feel the urge to fix it.  I won't, anyway.
>
> I have addressed this issue with the attached patch:
>
>         $ pg_ctl -D /lkjasdf status
>         pg_ctl: directory "/lkjasdf" does not exist
>         $ pg_ctl -D / status
>         pg_ctl: directory "/" is not a database cluster directory
>
> One odd question is that pg_ctl status has this comment for reporting
> the exit code for non-running servers:
>
>     printf(_("%s: no server running\n"), progname);
>
>     /*
>      * The Linux Standard Base Core Specification 3.1 says this should return
>      * '3'
>      * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
>      */
>     exit(3);
>
> If they haven't passed us a data directory, we don't really know if the
> server is running or not, so the patch just returns '1'.

But for such cases, isn't the status 4 more appropriate?
As per above link "4 program or service status is unknown"

status 1 - "1 program is dead and /var/run pid file exists"
Going by this definition, it seems status 1 means, someone
has forcefully killed the server and pid file still remains.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Thu, Mar  6, 2014 at 09:54:57AM +0530, Amit Kapila wrote:
> > If they haven't passed us a data directory, we don't really know if the
> > server is running or not, so the patch just returns '1'.
> 
> But for such cases, isn't the status 4 more appropriate?
> As per above link "4 program or service status is unknown"
> 
> status 1 - "1 program is dead and /var/run pid file exists"
> Going by this definition, it seems status 1 means, someone
> has forcefully killed the server and pid file still remains.

Technically, you are right, but I tried a while ago to assign meaningful
values to all the exit locations and the community feedback I got was
that we didn't want that.  I don't see how specifying non-existant or
non-cluster directory would somehow be a case that would be special.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_ctl status with nonexistent data directory

From
Florian Pflug
Date:
On Mar6, 2014, at 00:08 , Bruce Momjian <bruce@momjian.us> wrote:
> I have addressed this issue with the attached patch:
>
>     $ pg_ctl -D /lkjasdf status
>     pg_ctl: directory "/lkjasdf" does not exist
>     $ pg_ctl -D / status
>     pg_ctl: directory "/" is not a database cluster directory
>
> One odd question is that pg_ctl status has this comment for reporting
> the exit code for non-running servers:
>
>    printf(_("%s: no server running\n"), progname);
>
>    /*
>     * The Linux Standard Base Core Specification 3.1 says this should return
>     * '3'
>     * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
>     */
>    exit(3);
>
> If they haven't passed us a data directory, we don't really know if the
> server is running or not, so the patch just returns '1'.

Why change the exit code at all in the ENOENT-case? If the directory
does not exist, it's fairly certain that the server is not running, so
"3" seems fine. Technically, changing the return value is an API change
and might break things, so why do it if there's no clear benefit?

In the EPERM case (or, rather the non-ENOENT case), I agree with Amit
that "4" (meaning "program or service status is unknown") fits much better
than "1" (meaning "program is dead and /var/run pid file exists"). So *if*
we change it at all, we should change it to 4, not to some other, equally
arbitrary value.

best regards,
Florian Pflug








Re: pg_ctl status with nonexistent data directory

From
Alvaro Herrera
Date:
Bruce Momjian escribió:

> Technically, you are right, but I tried a while ago to assign meaningful
> values to all the exit locations and the community feedback I got was
> that we didn't want that.

That sounds odd.  Do you have a link?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_ctl status with nonexistent data directory

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Bruce Momjian escribi�:
>> Technically, you are right, but I tried a while ago to assign meaningful
>> values to all the exit locations and the community feedback I got was
>> that we didn't want that.

> That sounds odd.  Do you have a link?

FWIW, I recall that in my former life at Red Hat, I had to deal several
times with patching pg_ctl to make its exit codes more LSB-compliant.
And I think Debian does the same.  So Linux packagers, at least, would
like to see us pay more attention to that standard.  On the other hand,
there is no market for changes that make the exit codes "more
meaningful" unless the changes can be justified by chapter and verse
in LSB.
        regards, tom lane



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Thu, Mar  6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:
> Bruce Momjian escribió:
> 
> > Technically, you are right, but I tried a while ago to assign meaningful
> > values to all the exit locations and the community feedback I got was
> > that we didn't want that.
> 
> That sounds odd.  Do you have a link?

Sure, the patch is here:
http://www.postgresql.org/message-id/20130629025033.GI13790@momjian.us

and the idea of keeping what we have is stated here:
http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_ctl status with nonexistent data directory

From
Amit Kapila
Date:
On Thu, Mar 6, 2014 at 7:46 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Mar  6, 2014 at 09:54:57AM +0530, Amit Kapila wrote:
>> > If they haven't passed us a data directory, we don't really know if the
>> > server is running or not, so the patch just returns '1'.
>>
>> But for such cases, isn't the status 4 more appropriate?
>> As per above link "4 program or service status is unknown"
>>
>> status 1 - "1 program is dead and /var/run pid file exists"
>> Going by this definition, it seems status 1 means, someone
>> has forcefully killed the server and pid file still remains.
>
> Technically, you are right, but I tried a while ago to assign meaningful
> values to all the exit locations and the community feedback I got was
> that we didn't want that.  I don't see how specifying non-existant or
> non-cluster directory would somehow be a case that would be special.

One reason could be that as we are already returning special exit code
for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
also get called during status command.

Also in the link sent by you in another mail, I could see some statement
which indicates it is more important to return correct codes in case of
status which sounds a good reason to me.

Link and statement
http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net

"The "status" case is different, because there the exit code
can be passed out by the init script directly."

If we just want to handle correct exit codes for "status" option, then
may be we need to refactor the code a bit to ensure the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pg_ctl status with nonexistent data directory

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Mar  6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:
>> Bruce Momjian escribi�:
>>> Technically, you are right, but I tried a while ago to assign meaningful
>>> values to all the exit locations and the community feedback I got was
>>> that we didn't want that.

>> That sounds odd.  Do you have a link?

> Sure, the patch is here:
>     http://www.postgresql.org/message-id/20130629025033.GI13790@momjian.us
> and the idea of keeping what we have is stated here:
>     http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net

Perhaps I shouldn't be putting words in Peter's mouth, but my reading of
his complaint was that he didn't think you'd mapped the pg_ctl failure
conditions to LSB status codes very well.  That's not necessarily a vote
against the abstract idea of making it more LSB-compliant.

But it seems like we might have to go through it case-by-case to argue out
what's the right error code for each case ... and I'm not sure anybody
thinks it's worth that much effort.
        regards, tom lane



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Thu, Mar  6, 2014 at 10:43:01PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Thu, Mar  6, 2014 at 12:17:55PM -0300, Alvaro Herrera wrote:
> >> Bruce Momjian escribi�:
> >>> Technically, you are right, but I tried a while ago to assign meaningful
> >>> values to all the exit locations and the community feedback I got was
> >>> that we didn't want that.
> 
> >> That sounds odd.  Do you have a link?
> 
> > Sure, the patch is here:
> >     http://www.postgresql.org/message-id/20130629025033.GI13790@momjian.us
> > and the idea of keeping what we have is stated here:
> >     http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net
> 
> Perhaps I shouldn't be putting words in Peter's mouth, but my reading of
> his complaint was that he didn't think you'd mapped the pg_ctl failure
> conditions to LSB status codes very well.  That's not necessarily a vote
> against the abstract idea of making it more LSB-compliant.
> 
> But it seems like we might have to go through it case-by-case to argue out
> what's the right error code for each case ... and I'm not sure anybody
> thinks it's worth that much effort.

Yes, I think the question was whether the effort was worth it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Fri, Mar  7, 2014 at 09:07:24AM +0530, Amit Kapila wrote:
> One reason could be that as we are already returning special exit code
> for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
> also get called during status command.
>
> Also in the link sent by you in another mail, I could see some statement
> which indicates it is more important to return correct codes in case of
> status which sounds a good reason to me.
>
> Link and statement
> http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net
>
> "The "status" case is different, because there the exit code
> can be passed out by the init script directly."
>
> If we just want to handle correct exit codes for "status" option, then
> may be we need to refactor the code a bit to ensure the same.

OK, done with the attached patch  Three is returned for status, but one
for other cases.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: pg_ctl status with nonexistent data directory

From
Amit Kapila
Date:
On Fri, Mar 7, 2014 at 9:41 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Mar  7, 2014 at 09:07:24AM +0530, Amit Kapila wrote:
>> One reason could be that as we are already returning special exit code
>> for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as this
>> also get called during status command.
>>
>> Also in the link sent by you in another mail, I could see some statement
>> which indicates it is more important to return correct codes in case of
>> status which sounds a good reason to me.
>>
>> Link and statement
>> http://www.postgresql.org/message-id/51D1E482.5090602@gmx.net
>>
>> "The "status" case is different, because there the exit code
>> can be passed out by the init script directly."
>>
>> If we just want to handle correct exit codes for "status" option, then
>> may be we need to refactor the code a bit to ensure the same.
>
> OK, done with the attached patch  Three is returned for status, but one
> for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

Other than that patch seems fine to me.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Fri, Mar  7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:
> > OK, done with the attached patch  Three is returned for status, but one
> > for other cases.
>
> I think it would have been better if it return status as 4 for cases where
> directory/file is not accessible (current new cases added by this patch).
>
> I think status 3 should be only return only when the data directory is valid
> and pid file is missing, because in script after getting this code, the next
> thing they will probably do is to start the server which doesn't seem a
> good fit for newly added cases.

OK, I have updated the attached patch to reflect this, and added a C
comment.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: pg_ctl status with nonexistent data directory

From
Amit Kapila
Date:
On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Mar  7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:
>> > OK, done with the attached patch  Three is returned for status, but one
>> > for other cases.
>>
>> I think it would have been better if it return status as 4 for cases where
>> directory/file is not accessible (current new cases added by this patch).
>>
>> I think status 3 should be only return only when the data directory is valid
>> and pid file is missing, because in script after getting this code, the next
>> thing they will probably do is to start the server which doesn't seem a
>> good fit for newly added cases.
>
> OK, I have updated the attached patch to reflect this, and added a C
> comment.

This is fine, do you think we should mention about this in docs?
I have added one line in docs (updated patch attached), if you think
it makes sense then add it otherwise ignore it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: pg_ctl status with nonexistent data directory

From
Bruce Momjian
Date:
On Sat, Mar  8, 2014 at 09:08:31AM +0530, Amit Kapila wrote:
> On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, Mar  7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:
> >> > OK, done with the attached patch  Three is returned for status, but one
> >> > for other cases.
> >>
> >> I think it would have been better if it return status as 4 for cases where
> >> directory/file is not accessible (current new cases added by this patch).
> >>
> >> I think status 3 should be only return only when the data directory is valid
> >> and pid file is missing, because in script after getting this code, the next
> >> thing they will probably do is to start the server which doesn't seem a
> >> good fit for newly added cases.
> >
> > OK, I have updated the attached patch to reflect this, and added a C
> > comment.
> 
> This is fine, do you think we should mention about this in docs?
> I have added one line in docs (updated patch attached), if you think
> it makes sense then add it otherwise ignore it.

I have applied your version of the patch with slightly different text:
If an accessible data directory is not specified, the process returns anexit status of 4.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +