Thread: AIX and EAGAIN on open()
Hello, a customer running PG on AIX [1] is occasionally seeing "Resource temporarily unavailable" (EAGAIN) returned by open() calls: [1] We have PostgreSQL 11.13 on powerpc-ibm-aix7.2.5.0, compiled by /opt/IBM/xlc/13.1.0/bin/xlc, 64-bit 2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: ERROR: could not open file "base/16401/935915821_fsm": Resourcetemporarily unavailable 2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: CONTEXT: SQL statement "INSERT INTO s[...]" PL/pgSQL function s...() line 12 at SQL statement 2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: STATEMENT: PREPARE ... AS insert into ... 2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: ERROR: could not access status of transaction 0 2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: DETAIL: Could not open file "pg_subtrans/6158": Resource temporarilyunavailable. 2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: STATEMENT: PREPARE ... AS update ... 2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: ERROR: could not access status of transaction 0 2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: DETAIL: Could not open file "pg_subtrans/AC9E": Resource temporarilyunavailable. 2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: STATEMENT: PREPARE ... AS DELETE FROM .... open() should not return EAGAIN as per POSIX [2], [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357_05 and the AIX documentation says it would only return EAGAIN if O_TRUNC is used [3], but as far as I can tell, PG does not use that flag. [3] https://www.ibm.com/docs/en/aix/7.2?topic=o-open-openat-openx-openxat-open64-open64at-open64x-open64xat-creat-creat64-subroutine IBM's reply to the issue back in December 2020 was this: The man page / infocenter document is not intended as an exhaustive list of all possible error codes returned and their circumstances. "Resource temporarily unavailable" may also be returned for O_NSHARE, O_RSHARE with O_NONBLOCK. Afaict, PG does not use these flags either. We also ruled out that the system is using any anti-virus or similar tooling that would intercept IO traffic. Does anything of that ring a bell for someone? Is that an AIX bug, a PG bug, or something else? Christoph -- Senior Consultant, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Mon, Jun 20, 2022 at 9:53 PM Christoph Berg <christoph.berg@credativ.de> wrote: > IBM's reply to the issue back in December 2020 was this: > > The man page / infocenter document is not intended as an exhaustive > list of all possible error codes returned and their circumstances. > "Resource temporarily unavailable" may also be returned for > O_NSHARE, O_RSHARE with O_NONBLOCK. > > Afaict, PG does not use these flags either. > > We also ruled out that the system is using any anti-virus or similar > tooling that would intercept IO traffic. > > Does anything of that ring a bell for someone? Is that an AIX bug, a > PG bug, or something else? No clue here. Anything unusual about the file system (NFS etc)? Can you truss/strace the system calls, to sanity check the flags arriving into open(), and see if there's any unexpected other activity around open() calls that might be coming from something you're linked against?
Re: Thomas Munro > > Does anything of that ring a bell for someone? Is that an AIX bug, a > > PG bug, or something else? > > No clue here. Anything unusual about the file system (NFS etc)? Can > you truss/strace the system calls, to sanity check the flags arriving > into open(), and see if there's any unexpected other activity around > open() calls that might be coming from something you're linked > against? Hi, it's local storage, 16Gb SAN, Unity 500 storage, all data is on SSD disks, and file system is JFS2 (mount options are rw,log=INLINE). Good point about the flags, but we don't have access to the servers, so not sure if it will be possible to retrieve strace information. I'll try asking. Thanks, Christoph -- Senior Consultant, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz