Thread: BUG #8532: postgres fails to start with timezone-data >=2013e
The following bug has been logged on the website: Bug reference: 8532 Logged by: Timo Gurr Email address: timo.gurr@gmail.com PostgreSQL version: 9.1.10 Operating system: Gentoo Linux (64bit, kernel 3.11.0, glibc 2.17) Description: >From the timezone-data NEWS: Release 2013e - 2013-09-19 23:50:04 -0700 Changes affecting the build procedure When building the 'posix' or 'right' subdirectories, if the subdirectory would be a copy of the default subdirectory, it is now made a symbolic link if that is supported. This saves about 2 MB of file system space. This change breaks postgres, so then having a recent enough timezone-data package installed on the system postgres fails to start: /var/lib/postgresql/9.1/data/postmaster.log FATAL: Â exceeded maxAllocatedDescs (16) while trying to open directory "/usr/share/zoneinfo" # ls -la /usr/share/zoneinfo/ lrwxrwxrwx 1 root root 1 Oct 16 12:06 posix -> . Gentoo has a downstream bugreport about it stating the problem should be fixed on the postgres side: https://bugs.gentoo.org/show_bug.cgi?id=486556 Also found on the net: http://blog.endpoint.com/2013/06/debugging-obscure-postgres-problems.html The mentioned workaround by manually removing the symlink lets postgres start fine again.
On 16.10.2013 13:09, timo.gurr@gmail.com wrote: > The following bug has been logged on the website: > > Bug reference: 8532 > Logged by: Timo Gurr > Email address: timo.gurr@gmail.com > PostgreSQL version: 9.1.10 > Operating system: Gentoo Linux (64bit, kernel 3.11.0, glibc 2.17) > Description: > >> From the timezone-data NEWS: > > > Release 2013e - 2013-09-19 23:50:04 -0700 > Changes affecting the build procedure > When building the 'posix' or 'right' subdirectories, if the > subdirectory would be a copy of the default subdirectory, it is > now made a symbolic link if that is supported. This saves about > 2 MB of file system space. > > > This change breaks postgres, so then having a recent enough timezone-data > package installed on the system postgres fails to start: > > > /var/lib/postgresql/9.1/data/postmaster.log > FATAL: exceeded maxAllocatedDescs (16) while trying to open directory > "/usr/share/zoneinfo" > > > # ls -la /usr/share/zoneinfo/ > lrwxrwxrwx 1 root root 1 Oct 16 12:06 posix -> . > > > Gentoo has a downstream bugreport about it stating the problem should be > fixed on the postgres side: > https://bugs.gentoo.org/show_bug.cgi?id=486556 When you download the vanilla timezone sources and install, the directory layout looks different: ~/tz ((2013e))$ make -s install DESTDIR=foo TZDIR=/usr/share/zoneinfo ar: creating foo/usr/local/lib/libtz.a mkdir: cannot create directory 'foo/usr/local': File exists mkdir: cannot create directory 'foo/usr/local': File exists ~/tz ((2013e))$ ls -l foo/usr/share/ total 8 drwxr-xr-x 19 heikki heikki 4096 Oct 21 12:48 zoneinfo drwxr-xr-x 19 heikki heikki 4096 Oct 21 12:48 zoneinfo-leaps lrwxrwxrwx 1 heikki heikki 8 Oct 21 12:48 zoneinfo-posix -> zoneinfo There is no 'posix' symlink inside 'zoneinfo'. The zoneinfo git repository says that this layout was adopted in the upstream library a long time ago: > commit 77e3dfe1a7b7e14e9f252fc628a5d405c35b6444 > Author: Arthur David Olson <ado@elsie> > Date: Mon May 25 13:04:43 1998 -0400 > > Eggert mod > > SCCS-file: Makefile > SCCS-SID: 7.66 > > diff --git a/Makefile b/Makefile > index a8d8067..ae414b4 100644 > --- a/Makefile > +++ b/Makefile > @@ -293,10 +293,19 @@ posix_only: zic $(TDATA) > right_only: zic leapseconds $(TDATA) > $(ZIC) -y $(YEARISTYPE) -d $(TZDIR) -L leapseconds $(TDATA) > > +# In earlier versions of this makefile, the other two directories were > +# subdirectories of $(TZDIR). However, this led to configuration errors. > +# For example, with posix_right under the earlier scheme, > +# TZ='right/Australia/Adelaide' got you localtime with leap seconds, > +# but gmtime without leap seconds, which led to problems with applications > +# like sendmail that subtract gmtime from localtime. > +# Therefore, the other two directories are now siblings of $(TZDIR). > +# You must replace all of $(TZDIR) to switch from not using leap seconds > +# to using them, or vice versa. > other_two: zic leapseconds $(TDATA) > - $(ZIC) -y $(YEARISTYPE) -d $(TZDIR)/posix -L /dev/null $(TDATA) > + $(ZIC) -y $(YEARISTYPE) -d $(TZDIR)-posix -L /dev/null $(TDATA) > $(ZIC) -y $(YEARISTYPE) \ > - -d $(TZDIR)/right -L leapseconds $(TDATA) > + -d $(TZDIR)-right -L leapseconds $(TDATA) > > posix_right: posix_only other_two > However, Gentoo seems to carry a patch that reverts that commit: http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/sys-libs/timezone-data/files/timezone-data-2013f-makefile.patch?revision=1.1&view=markup That patch conflicts with the upstream Makefile change to create the "other" directory as a symlink. With the vanilla zoneinfo sources, the symlink is fine, but by putting 'posix' inside 'zoneinfo' directory, the Gentoo-specific patch creates that infinite recursion situation. Gentoo isn't alone in doing this: my Debian system has a similar layout, with 'posix' directory inside /usr/share/zoneinfo, rather than as a sibling. I'm not sure if Debian will have this problem, though. If I'm reading the debian rules file correctly, they're not relying on the upstream "make install" to create the 'posix' and 'right' directories, but calls zic directly. In summary, I'd call this a packaging bug. That said, the error message you get from PostgreSQL isn't very user-friendly. There is a check on recursion depth in the timezone traversing code, but apparently it trips on another limit first, on the number of directory handles that can be open at a time. Also, I don't understand how this is preventing PostgreSQL from starting up. AFAICS the traversal of the timezones is only done when you query the pg_timezone_names system view. Not at startup. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 16.10.2013 13:09, timo.gurr@gmail.com wrote: >> # ls -la /usr/share/zoneinfo/ >> lrwxrwxrwx 1 root root 1 Oct 16 12:06 posix -> . > That patch conflicts with the upstream Makefile change to create the > "other" directory as a symlink. With the vanilla zoneinfo sources, the > symlink is fine, but by putting 'posix' inside 'zoneinfo' directory, the > Gentoo-specific patch creates that infinite recursion situation. I agree, this is an egregious packaging bug. Programs should be able to enumerate the timezone database without running into infinite recursion. > That said, the error message you get from PostgreSQL isn't very > user-friendly. There is a check on recursion depth in the timezone > traversing code, but apparently it trips on another limit first, on the > number of directory handles that can be open at a time. > Also, I don't understand how this is preventing PostgreSQL from starting > up. AFAICS the traversal of the timezones is only done when you query > the pg_timezone_names system view. Not at startup. Keep in mind that in 9.2 and later, we traverse the timezone tree in initdb to set the timezone GUC. Before that, we would do it in postmaster startup --- but only if we didn't find a TZ variable in the environment nor a setting in postgresql.conf. I tried to reproduce this in HEAD by inserting a bogus symlink into the installation timezone tree and running initdb. Curiously, it did not fail, though initdb took rather longer than expected. After debugging I realized that scan_available_timezones() was in fact recursing deeper and deeper into the posix/posix/posix/posix/... nest --- but eventually, the constructed filename exceeds MAXPGPATH, and we truncate it, and fail to open the truncated filename, so the recursion stops. (And you don't get any error message, unless you compiled with DEBUG_IDENTIFY_TIMEZONE.) Also, the implementation in initdb isn't vulnerable to running out of descriptors because it sucks in an entire directory at a time with pgfnames(), so it doesn't have a descriptor open when it recurses. (Instead, it consumes a lot of memory --- but it looks like still only about 10MB worth.) In 9.1, the reason you see the maxAllocatedDescs complaint is that the postmaster tries to set the timezone before it's increased max_safe_fds, so it won't increase maxAllocatedDescs past 16. Enumerating the zones in a regular backend would almost certainly report the timezone recursion error instead. (I am kinda wondering why maxAllocatedDescs == 16 isn't enough to get to a recursion error at depth 10, but maybe there are a few other files open when this happens.) Basically, I don't think we should do anything about this. Packaging the TZ database like that is completely brain-dead, and Gentoo needs to fix it, not tell us we're doing something wrong. The consequences of their bug aren't too serious in modern PG releases anyway. (Given what I know of their packaging policies, I have to wonder why they're still shipping 9.1 rather than the bleeding edge...) regards, tom lane
On 2013-11-10 20:57, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > On 16.10.2013 13:09, timo.gurr@gmail.com wrote: > >> # ls -la /usr/share/zoneinfo/ > >> lrwxrwxrwx 1 root root 1 Oct 16 12:06 posix -> . > > > That patch conflicts with the upstream Makefile change to create the > > "other" directory as a symlink. With the vanilla zoneinfo sources, the > > symlink is fine, but by putting 'posix' inside 'zoneinfo' directory, the > > Gentoo-specific patch creates that infinite recursion situation. > > I agree, this is an egregious packaging bug. Programs should be able to > enumerate the timezone database without running into infinite recursion. I agree. This looks like it's rather recent. I'm waiting for a reply =66rom the toolchain herd as to why they're effectively reverting that patc= h. > Basically, I don't think we should do anything about this. Packaging > the TZ database like that is completely brain-dead, and Gentoo needs > to fix it, not tell us we're doing something wrong. The consequences > of their bug aren't too serious in modern PG releases anyway. (Given > what I know of their packaging policies, I have to wonder why they're > still shipping 9.1 rather than the bleeding edge...) > > regards, tom lane > Respectfully, I disagree. PostgreSQL should be able to handle cyclic directory structures gracefully. More basic utilities, like ls and du, won't die because of it. And, we've been keeping pace rather well. We had 9.3.1 in the tree before you wrote your email. :p -- Mr. Aaron W. Swenson Gentoo Linux Developer PostgreSQL Herd Bull Email : titanofold@gentoo.org GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0 GnuPG ID : D1BBFDA0