Thread: AIX and EAGAIN on open()

AIX and EAGAIN on open()

From
Christoph Berg
Date:
Hello,

a customer running PG on AIX [1] is occasionally seeing "Resource
temporarily unavailable" (EAGAIN) returned by open() calls:

[1] We have PostgreSQL 11.13 on powerpc-ibm-aix7.2.5.0, compiled by /opt/IBM/xlc/13.1.0/bin/xlc, 64-bit

2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: ERROR:  could not open file "base/16401/935915821_fsm":
Resourcetemporarily unavailable
 
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: CONTEXT:  SQL statement "INSERT INTO s[...]"
        PL/pgSQL function s...() line 12 at SQL statement
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: STATEMENT:  PREPARE ... AS insert into ...


2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: ERROR:  could not access status of transaction 0
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: DETAIL:  Could not open file "pg_subtrans/6158": Resource
temporarilyunavailable.
 
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: STATEMENT:  PREPARE ... AS update ...


2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: ERROR:  could not access status of transaction 0
2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: DETAIL:  Could not open file "pg_subtrans/AC9E": Resource
temporarilyunavailable.
 
2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: STATEMENT:  PREPARE ... AS DELETE FROM ....


open() should not return EAGAIN as per POSIX [2],

[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357_05

and the AIX documentation says it would only return EAGAIN if O_TRUNC
is used [3], but as far as I can tell, PG does not use that flag.

[3]
https://www.ibm.com/docs/en/aix/7.2?topic=o-open-openat-openx-openxat-open64-open64at-open64x-open64xat-creat-creat64-subroutine

IBM's reply to the issue back in December 2020 was this:

  The man page / infocenter document is not intended as an exhaustive
  list of all possible error codes returned and their circumstances.
  "Resource temporarily unavailable" may also be returned for
  O_NSHARE, O_RSHARE with O_NONBLOCK.

Afaict, PG does not use these flags either.

We also ruled out that the system is using any anti-virus or similar
tooling that would intercept IO traffic.

Does anything of that ring a bell for someone? Is that an AIX bug, a
PG bug, or something else?

Christoph
-- 
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt folgenden
Bestimmungen: https://www.credativ.de/datenschutz



Re: AIX and EAGAIN on open()

From
Thomas Munro
Date:
On Mon, Jun 20, 2022 at 9:53 PM Christoph Berg
<christoph.berg@credativ.de> wrote:
> IBM's reply to the issue back in December 2020 was this:
>
>   The man page / infocenter document is not intended as an exhaustive
>   list of all possible error codes returned and their circumstances.
>   "Resource temporarily unavailable" may also be returned for
>   O_NSHARE, O_RSHARE with O_NONBLOCK.
>
> Afaict, PG does not use these flags either.
>
> We also ruled out that the system is using any anti-virus or similar
> tooling that would intercept IO traffic.
>
> Does anything of that ring a bell for someone? Is that an AIX bug, a
> PG bug, or something else?

No clue here.  Anything unusual about the file system (NFS etc)?  Can
you truss/strace the system calls, to sanity check the flags arriving
into open(), and see if there's any unexpected other activity around
open() calls that might be coming from something you're linked
against?



Re: AIX and EAGAIN on open()

From
Christoph Berg
Date:
Re: Thomas Munro
> > Does anything of that ring a bell for someone? Is that an AIX bug, a
> > PG bug, or something else?
> 
> No clue here.  Anything unusual about the file system (NFS etc)?  Can
> you truss/strace the system calls, to sanity check the flags arriving
> into open(), and see if there's any unexpected other activity around
> open() calls that might be coming from something you're linked
> against?

Hi,

it's local storage, 16Gb SAN, Unity 500 storage, all data is on SSD
disks, and file system is JFS2 (mount options are rw,log=INLINE).

Good point about the flags, but we don't have access to the servers,
so not sure if it will be possible to retrieve strace information.
I'll try asking.

Thanks,
Christoph
-- 
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt folgenden
Bestimmungen: https://www.credativ.de/datenschutz