Thread: Bug #882: Cannot manually log in to database.

Bug #882: Cannot manually log in to database.

From
pgsql-bugs@postgresql.org
Date:
Ben Kinsey (benk@aiinet.com; sk9887@sbc.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
Cannot manually log in to database.

Long Description
We are receiving the following error when trying to manually log in to the the database:

okapview# /opt/pgsql-7.1.3/bin/psql -U postgres -d AppliedView
psql: connectDBStart() -- connect() failed: No such file or directory
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?

We searched your documentation and all that it said was to verify that the postmaster daemon is running, and it already
isrunning on the system.  We have a daemon process that is connected to the database and it is not refused this
connection. Only psql command line log ins are refused.   

Stopping and starting the postmaster daemon clears up this problem, but this problem creeps up about 2 times a week,
andis a major annoyance. 

Sample Code


No file was uploaded with this report

Re: Bug #882: Cannot manually log in to database.

From
Tom Lane
Date:
pgsql-bugs@postgresql.org writes:
> okapview# /opt/pgsql-7.1.3/bin/psql -U postgres -d AppliedView
> psql: connectDBStart() -- connect() failed: No such file or directory
>         Is the postmaster running locally
>         and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?

> Stopping and starting the postmaster daemon clears up this problem, but this problem creeps up about 2 times a week,
andis a major annoyance. 

Sounds to me like you've got a cron script that removes everything in
tmp about twice a week.  I suggest teaching it not to remove socket
files.  On most Unixen the mod date on a socket file isn't changed by
normal activity, so a tmp-cleaner that only pays attention to the mod
date will mistakenly decide a socket is fair game for removal.

            regards, tom lane

Re: Bug #882: Cannot manually log in to database.

From
Giles Lean
Date:
[ Where *did* that Reply-To: line come from -- it's broken ...

  repl: bad addresses:
    benk@aiinet.com; -- extraneous semi-colon
]

> Stopping and starting the postmaster daemon clears up this problem,
> but this problem creeps up about 2 times a week, and is a major
> annoyance.

Either teach your /tmp cleaner not to clean out the socket files as
Tom Lane suggested, or arrange to update the socket timestamps.  I
think it's easier to just keep updating the timestamps -- then I don't
have to educate each new system administrator.

    utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

If you can't write that into C, drop me a line, and I'll send you the
code.  Most touch(1) implementations would also do the right thing, so
you could try that too.  Then put whatever solution you choose into
cron, and you're done.

Regards,

Giles

Re: Bug #882: Cannot manually log in to database.

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
> Either teach your /tmp cleaner not to clean out the socket files as
> Tom Lane suggested, or arrange to update the socket timestamps.  I
> think it's easier to just keep updating the timestamps -- then I don't
> have to educate each new system administrator.

>     utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);

Hm, do you think that's portable?

There is already code in the postmaster to touch the socket lock file
every few minutes, so as to keep tmp-cleaners from zapping it.  (Or at
least there once was; I can't find it right now.)  If we could do the
same for the socket file it'd be really nice.  But I didn't think there
was any portable way to update the mod timestamp on a socket.

            regards, tom lane

Re: Bug #882: Cannot manually log in to database.

From
Giles Lean
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Giles Lean <giles@nemeton.com.au> writes:
>
> >     utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);
>
> Hm, do you think that's portable?

Hm ... yes, actually I do.  I use it on HP-UX, and testing indicates
that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

Thinking about it, a Unix domain socket has an entry in the filesystem
and thus an inode. utimes() operates on the inode so it makes sense to
me that this should Just Work.

While UNIX98 (aka the "Single Unix Standard, version 2") talks about a
"file" argument to utimes() it doesn't make any particular mention
about restrictions on what type of file, and the function needs to
work on some non-regular files such as device files to be useful.

> There is already code in the postmaster to touch the socket lock file
> every few minutes, so as to keep tmp-cleaners from zapping it.  (Or at
> least there once was; I can't find it right now.)  If we could do the
> same for the socket file it'd be really nice.  But I didn't think there
> was any portable way to update the mod timestamp on a socket.

I've done some testing today, and the test passed on everything I
tested it on:

    FreeBSD 4.7-RELEASE alpha
    HP-UX B.11.11 9000/800
    HP-UX B.11.22 ia64
    Linux 2.4.18-14 i686              # RedHat Linux 8.0
    Linux 2.4.18-mckinley-smp ia64    # Debian GNU/Linux 3.0
    NetBSD 1.6_STABLE i386
    OSF1 V4.0 alpha                   # Tru64
    OSF1 V5.1 alpha                   # Tru64

It's too hot here today to go outside but even so, that's enough
testing ...

I've attached the code I used.  It was considered to work if utimes()
didn't return an error and if the st_mtime value returned by stat()
changed:

   $ make socket_utimes
   cc -O2   -o socket_utimes socket_utimes.c
   $ ./socket_utimes socket
   utimes() successfully changed a Unix domain socket mtime.
   $ uname -srm
   NetBSD 1.6_STABLE i386

If utimes() works on the other supported platforms that have Unix
domain sockets perhaps we can put the /tmp cleaners to rest for good.

Anyone willing to test AIX, IRIX, MacOS X, Solaris, or SCO Unix?  I
don't expect the Windows ports with or without cygwin will support
Unix domain sockets, so they probably don't need testing. :-)

Regards,

Giles

P.S. http://www.testdrive.hp.com is great for quick portability
testing.  It was a Compaq program that HP has expanded since their
merger.  Highly recommended.
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/un.h>

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
    char *path;
    int sock_fd;
    struct sockaddr_un addr;
    struct stat sb_before;
    struct stat sb_after;

    if (argc != 2) {
        fprintf(stderr, "usage: socket_utimes path\n");
        exit(EXIT_FAILURE);
    }
    path = argv[1];

    sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (sock_fd == -1) {
        fprintf(stderr, "socket: %s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }

    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);

#ifndef SUN_LEN
#define SUN_LEN(su) \
    (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))
#endif

    if (bind(sock_fd, (struct sockaddr *) &addr, SUN_LEN(&addr)) == -1) {
        fprintf(stderr, "bind: %s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }

    if (stat(path, &sb_before) == -1) {
        fprintf(stderr, "stat: %s: %s\n", path, strerror(errno));
        (void) unlink(path);
        exit(EXIT_FAILURE);
    }

    sleep(2);

    if (utimes(path, (struct timeval *) 0) == -1) {
        fprintf(stderr, "utimes: %s: %s\n", path, strerror(errno));
        (void) unlink(path);
        exit(EXIT_FAILURE);
    }

    if (stat(path, &sb_after) == -1) {
        fprintf(stderr, "stat: %s: %s\n", path, strerror(errno));
        (void) unlink(path);
        exit(EXIT_FAILURE);
    }

    if (sb_before.st_mtime == sb_after.st_mtime) {
        printf("Oops: utimes() failed to change mtime\n");
        (void) unlink(path);
        exit(EXIT_FAILURE);
    }

    printf("utimes() successfully changed a Unix domain socket mtime.\n");
    (void) unlink(path);
    exit(EXIT_SUCCESS);
}





Re: Bug #882: Cannot manually log in to database.

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
>>> utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);
>>
>> Hm, do you think that's portable?

> Hm ... yes, actually I do.  I use it on HP-UX, and testing indicates
> that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

> Thinking about it, a Unix domain socket has an entry in the filesystem
> and thus an inode. utimes() operates on the inode so it makes sense to
> me that this should Just Work.

Sure, the question was more about whether the system call exists
everywhere.

> I've done some testing today, and the test passed on everything I
> tested it on:

I can add HPUX 10.20, Mac OS X 10.2.3, and a pretty ancient Linux
(kernel 2.0.36, not sure of the exact distro) to the list of stuff
your test program seems to pass on.

> If utimes() works on the other supported platforms that have Unix
> domain sockets perhaps we can put the /tmp cleaners to rest for good.

My feeling is we may as well put it in.  If it turns out we have
platforms without utimes(), we can put in a configure test and #ifdef
it.  If the call doesn't exist or doesn't update the mod time as
expected, we're no worse off than before --- and for platforms where
it does work, this is a big win.

Thanks for looking into it!  I'll work on applying the fix.

            regards, tom lane

Re: Bug #882: Cannot manually log in to database.

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> utimes("/tmp/.s.PGSQL.5432", (const struct timeval *) 0);
>>
>> Hm, do you think that's portable?

> Hm ... yes, actually I do.  I use it on HP-UX, and testing indicates
> that it works on FreeBSD, Linux, NetBSD and Tru64 as well.

Some digging about on the net revealed that there is a very similar
function utime() that is POSIX-standard, whereas utimes() is not.

Accordingly, I bit the bullet and put in a configure test to see which
one(s) we have.  With any luck, this will hold up through 7.4's port
testing.

            regards, tom lane