Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
Date
Msg-id 25272.1241214973@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error  (Mark <admin@asarian-host.net>)
List pgsql-bugs
I wrote:
> I was under the impression that there was some code in there to complain
> if the path-finding code failed, but maybe it's being executed too late.

I looked at this a bit more, and found that there is no such code.
Mark's complaint is easy to reproduce if you move (or hardlink) the
postgres executable into some other directory away from the share
directory and then try to start it on a valid data directory.  (If
it doesn't find postgresql.conf it'll fail sooner.)

initdb behaves a bit more sanely under similar circumstances:

$ initdb
initdb: file "/home/tgl/trial/share/postgresql/postgres.bki" does not exist
This might mean you have a corrupted installation or identified
the wrong directory with the invocation option -L.
$

The postmaster however is much less dependent on the contents of the
share dir than initdb is, so the first time it really notices something
is wrong is when it tries to find the file that the
timezone_abbreviations GUC is supposed to reference.  And when we get
there, in perhaps an overabundance of brevity we intentionally don't
report the file path:

    get_share_path(my_exec_path, share_path);
    snprintf(file_path, sizeof(file_path), "%s/timezonesets/%s",
             share_path, filename);
    tzFile = AllocateFile(file_path, "r");
    if (!tzFile)
    {
        /* at level 0, if file doesn't exist, guc.c's complaint is enough */
        if (errno != ENOENT || depth > 0)
            ereport(tz_elevel,
                    (errcode_for_file_access(),
                     errmsg("could not read time zone file \"%s\": %m",
                            filename)));
        return -1;
    }

So there are a number of things we could consider doing about this,
including just tweaking the above bit of code.  But that only helps
so long as this is the first such reference to fail during startup
--- which is surely pretty coincidental.

What I'm inclined to do is modify PostmasterMain so that immediately
after find_my_exec, it checks that get_share_path returns the name of
a readable directory.  (I see that it's already invoking get_pkglib_path
at that point, but not checking that the result points to anything ---
maybe we should check that too?)  The error message would then be
something similar to what initdb is saying above, ie, misconfigured
installation.  Maybe initdb should have an explicit test of this
nature too, because the message quoted above could still be
misinterpreted.

Or maybe this is more work than its worth.  I don't recall many similar
complaints previously.

Comments?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #4789: ERROR 22008 on timestamp import
Next
From: Mark
Date:
Subject: Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error