Re: BUG #8532: postgres fails to start with timezone-data >=2013e - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #8532: postgres fails to start with timezone-data >=2013e
Date
Msg-id 10389.1384135035@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #8532: postgres fails to start with timezone-data >=2013e  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: BUG #8532: postgres fails to start with timezone-data >=2013e  ("Aaron W. Swenson" <titanofold@gentoo.org>)
List pgsql-bugs
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 16.10.2013 13:09, timo.gurr@gmail.com wrote:
>> # ls -la /usr/share/zoneinfo/
>> lrwxrwxrwx  1 root root     1 Oct 16 12:06 posix ->  .

> That patch conflicts with the upstream Makefile change to create the
> "other" directory as a symlink. With the vanilla zoneinfo sources, the
> symlink is fine, but by putting 'posix' inside 'zoneinfo' directory, the
> Gentoo-specific patch creates that infinite recursion situation.

I agree, this is an egregious packaging bug.  Programs should be able to
enumerate the timezone database without running into infinite recursion.

> That said, the error message you get from PostgreSQL isn't very
> user-friendly. There is a check on recursion depth in the timezone
> traversing code, but apparently it trips on another limit first, on the
> number of directory handles that can be open at a time.

> Also, I don't understand how this is preventing PostgreSQL from starting
> up. AFAICS the traversal of the timezones is only done when you query
> the pg_timezone_names system view. Not at startup.

Keep in mind that in 9.2 and later, we traverse the timezone tree in
initdb to set the timezone GUC.  Before that, we would do it in postmaster
startup --- but only if we didn't find a TZ variable in the environment
nor a setting in postgresql.conf.

I tried to reproduce this in HEAD by inserting a bogus symlink into
the installation timezone tree and running initdb.  Curiously, it did not
fail, though initdb took rather longer than expected.  After debugging
I realized that scan_available_timezones() was in fact recursing deeper
and deeper into the posix/posix/posix/posix/... nest --- but eventually,
the constructed filename exceeds MAXPGPATH, and we truncate it, and
fail to open the truncated filename, so the recursion stops.
(And you don't get any error message, unless you compiled with
DEBUG_IDENTIFY_TIMEZONE.)  Also, the implementation in initdb isn't
vulnerable to running out of descriptors because it sucks in an entire
directory at a time with pgfnames(), so it doesn't have a descriptor
open when it recurses.  (Instead, it consumes a lot of memory --- but
it looks like still only about 10MB worth.)

In 9.1, the reason you see the maxAllocatedDescs complaint is that the
postmaster tries to set the timezone before it's increased max_safe_fds,
so it won't increase maxAllocatedDescs past 16.  Enumerating the zones
in a regular backend would almost certainly report the timezone recursion
error instead.  (I am kinda wondering why maxAllocatedDescs == 16 isn't
enough to get to a recursion error at depth 10, but maybe there are a
few other files open when this happens.)

Basically, I don't think we should do anything about this.  Packaging
the TZ database like that is completely brain-dead, and Gentoo needs
to fix it, not tell us we're doing something wrong.  The consequences
of their bug aren't too serious in modern PG releases anyway.  (Given
what I know of their packaging policies, I have to wonder why they're
still shipping 9.1 rather than the bleeding edge...)

            regards, tom lane

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Postgres: PANIC: WAL contains references to invalid pages
Next
From: Tom Lane
Date:
Subject: Re: BUG #8524: PQsendQueryParams with RETURNING clause on a INSERT