Thread: too-may-open-files log file entries when vauuming under solaris

too-may-open-files log file entries when vauuming under solaris

From
"Raschick, Hartmut"
Date:

Dear all,

recently we have seen a lot of occurrences of “out of file descriptors: Too many open files; release and retry” in our postgres log files, every night when a “vacuum full analyze” is run. After some digging into the code we found that postgres potentially tries to open as many as a pre-determined maximum number of file descriptors when vacuuming. That number is the lesser of the one from the configuration file (max_files_per_process) and the one determined at start-up by “src/backend/storage/file/fd.c::count_usable_fds()”. Under Solaris now, it would seem, finding out that number via dup(0) is not sufficient, as the actual number of interest might be/is the number of usable stream file descriptors (up until Solaris 10, at least). Also, closing the last recently used file descriptor might therefore not solve a temporary problem (as something below 256 is needed). Now, this can be fixed by setting/leaving the descriptor limit at 256 or changing the postgresql.conf setting accordingly. Still, the function for determining the max number is not working as intended under Solaris, it would appear. One might try using fopen() instead of dup() or have a different handling for stream and normal file descriptors (including moving standard file descriptors to above 255 to leave room for stream ones). Maybe though, all this is not worth the  effort; then it might perhaps be a good idea to mention the limitations/specialties in the platform specific notes (e.g. have u/limit at 256 maximum).

 

cheers

hardy

 

 

Hartmut Raschick

Network Management Solutions

-----------------------------------

KEYMILE GmbH

Wohlenbergstr. 3

D-30175 Hannover, Germany

 

Phone: +49 (0)511 6747-564

Fax:   +49 (0)511 6747-777

mailto:Hartmut.Raschick@keymile.com

http://www.keymile.com

 

<< KEYMILE – because connectivity matters >>

 

Geschäftsführer/Managing Directors: Björn Claaßen, Michael Breyer, Axel Föry - Rechtsform der Gesellschaft/Legal structure: GmbH, Sitz/Registered office: Hannover HRB 61069, Amtsgericht/Local court Hannover, USt-Id. Nr./VAT-Reg.-No.: DE 812282795, WEEE-Reg.-No.: DE 59336750

 

Re: too-may-open-files log file entries when vauuming under solaris

From
Tom Lane
Date:
"Raschick, Hartmut" <Hartmut.Raschick@keymile.com> writes:
> recently we have seen a lot of occurrences of "out of file descriptors:
> Too many open files; release and retry" in our postgres log files, every
> night when a "vacuum full analyze" is run.  After some digging into the
> code we found that postgres potentially tries to open as many as a
> pre-determined maximum number of file descriptors when vacuuming. That
> number is the lesser of the one from the configuration file
> (max_files_per_process) and the one determined at start-up by
> "src/backend/storage/file/fd.c::count_usable_fds()". Under Solaris now,
> it would seem, finding out that number via dup(0) is not sufficient, as
> the actual number of interest might be/is the number of usable stream
> file descriptors (up until Solaris 10, at least). Also, closing the last
> recently used file descriptor might therefore not solve a temporary
> problem (as something below 256 is needed). Now, this can be fixed by
> setting/leaving the descriptor limit at 256 or changing the
> postgresql.conf setting accordingly. Still, the function for determining
> the max number is not working as intended under Solaris, it would
> appear. One might try using fopen() instead of dup() or have a different
> handling for stream and normal file descriptors (including moving
> standard file descriptors to above 255 to leave room for stream
> ones). Maybe though, all this is not worth the effort; then it might
> perhaps be a good idea to mention the limitations/specialties in the
> platform specific notes (e.g. have u/limit at 256 maximum).

TBH this sounds like unfounded speculation.  AFAIK a Postgres backend will
not open anything but regular files after its initial startup.  I'm not
sure what a "stream" is on Solaris, but guessing that it refers to pipes
or sockets, I don't think we have a problem with an OS restriction that
those be below FD 256.  In any case, if we did, it would presumably show
up as errors not release-and-retry events.

Our usual experience is that you get release-and-retry log messages when
the OS is up against the system-wide open-file limit rather than the
per-process limit (ie, the underlying error code is ENFILE not EMFILE).
I don't know exactly how Solaris strerror() spells those codes so it's
difficult to tell from your reported log message which case is happening.
If it is the system-wide limit that's at issue, then of course the dup(0)
loop isn't likely to find it, and adjusting max_files_per_process (or
maybe better, reducing max_connections) is the expected solution.

            regards, tom lane


Re: too-may-open-files log file entries when vauuming under solaris

From
"Raschick, Hartmut"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, March 5, 2014 9:17 PM
> To: Raschick, Hartmut
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] too-may-open-files log file entries when
> vauuming under solaris
>
> "Raschick, Hartmut" <Hartmut.Raschick@keymile.com> writes:
> > recently we have seen a lot of occurrences of "out of file
> descriptors:
> > Too many open files; release and retry" in our postgres log files,
...
..
.
> > in the platform specific notes (e.g. have u/limit at 256 maximum).
>
> TBH this sounds like unfounded speculation
...
..
.
>             regards, tom lane

Hmmm... FWIW, we've compiled some more info to illustrate the problem. Starting w/a sniplet from the Solaris 10 "fopen"
manpage, test programs showing the "difference" between open and fopen calls under Solaris, a postgres log file
w/extendedlogging and a diff-file of that fd.c to show how the addt'l logs were produced. I hope this makes it somewhat
clearer,i.e. no system-wide, but a per-process topic and related to that Solaris 32-bit ABI backwards compatibility...
surelynot a world-wide problem, one would agree. Nevertheless, we'd though it prudent to at least mention it. 
Btw, here's what Oracle has to say: http://www.oracle.com/technetwork/server-storage/solaris10/stdio-256-136698.html
I hope the 15K attachment gets through...

cheers,
hardy

Hartmut Raschick
Network Management Solutions
-----------------------------------
KEYMILE GmbH
Wohlenbergstr. 3
D-30175 Hannover, Germany

Phone: +49 (0)511 6747-564
Fax:   +49 (0)511 6747-777
mailto:Hartmut.Raschick@keymile.com
http://www.keymile.com

<< KEYMILE - because connectivity matters >>

Geschäftsführer/Managing Directors: Björn Claaßen, Michael Breyer, Axel Föry - Rechtsform der Gesellschaft/Legal
structure:GmbH, Sitz/Registered office: Hannover HRB 61069, Amtsgericht/Local court Hannover, USt-Id. Nr./VAT-Reg.-No.:
DE812282795, WEEE-Reg.-No.: DE 59336750 


Attachment