Problems with huge_pages and IBM Power8 - Mailing list pgsql-hackers

From reiner peterke
Subject Problems with huge_pages and IBM Power8
Date
Msg-id 3489C14D-97DE-4303-97C6-99DBC70775F8@drizzle.com
Whole thread Raw
Responses Re: Problems with huge_pages and IBM Power8  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi

We have been doing some testing with Postgres (9.5.2) compiled on a Power8 running Centos 7

When working with huge_pages, we initially got this error.

munmap(0x3efbe4000000) failed: Invalid argument

after a bit of investigation we noticed that hugepagesize is har coded to 2MB

src/backend/port/sysv_shmem.c (ligne 360)
...
int                     hugepagesize = 2 * 1024 * 1024;

But on the power they were configured to 16MB.  Recompiling to 16MB (8 * 1024 * 1024) and we had no problems with the
tests.

My initial questions are.

1 what is the hugepagesize hard coded to 2MB?
2 are there any side effect in setting it to 16MB?
3 since on the poer hugepages can have different values, would it be possible to have this value configurable?

Going further, we tried testing hugepages also on Ubuntu 16.04, also on the power8.  On Ubuntu Postgres did not like
thehugepages at all (set also to 16MB)  and consistently crashed. 

Looking for some insight into this issue.  the error from the postgres log on ubuntu is below.
It apperas to be related to semephores.

I don't have the compile optiona at the moment, I can provide those are other detais as needed.

Reiner

2016-04-12 12:26:42 CEST : 0 FATAL:  semctl(7864340, 14, SETVAL, 0) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  server process (PID 13352) exited with exit code 1
2016-04-12 12:26:42 CEST : 0 LOG:  terminating any other active server processes
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200WARNING:  terminating connection because of crash of
anotherserver process 
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200DETAIL:  The postmaster has commanded this server process
toroll back the current transaction and exit, because another server process exited abnormally and possibly corrupted
sharedmemory. 
2016-04-12 12:26:42 CEST facturation:system_dba 0 10.32.32.200HINT:  In a moment you should be able to reconnect to the
databaseand repeat your command. 
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3WARNING:  terminating connection because of crash of another server
process
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3DETAIL:  The postmaster has commanded this server process to roll
backthe current transaction and exit, because another server process exited abnormally and possibly corrupted shared
memory.
2016-04-12 12:26:42 CEST postgres:admin 0 10.32.16.3HINT:  In a moment you should be able to reconnect to the database
andrepeat your command. 
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1WARNING:  terminating connection because of crash of another server
process
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 
2016-04-12 12:26:42 CEST postgres:perf_user 0 ::1HINT:  In a moment you should be able to reconnect to the database and
repeatyour command. 
2016-04-12 12:26:42 CEST : 0 WARNING:  terminating connection because of crash of another server process
2016-04-12 12:26:42 CEST : 0 DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
2016-04-12 12:26:42 CEST : 0 HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2016-04-12 12:26:42 CEST : 0 LOG:  all server processes terminated; reinitializing
2016-04-12 12:26:42 CEST : 0 LOG:  could not remove shared memory segment "/PostgreSQL.1612071802": No such file or
directory
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7274497, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7307267, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7340036, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7372805, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7405574, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7438343, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7471112, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7503881, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7536650, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7569419, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7602188, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7634957, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7667726, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7700495, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7733264, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7766033, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7798802, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7831571, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7864340, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:42 CEST : 0 LOG:  semctl(7897109, 0, IPC_RMID, ...) failed: Invalid argument
2016-04-12 12:26:43 CEST : 0 LOG:  database system was interrupted; last known up at 2016-04-12 12:22:08 CEST
2016-04-12 12:26:43 CEST : 0 LOG:  database system was not properly shut down; automatic recovery in progress
2016-04-12 12:26:43 CEST : 0 LOG:  redo starts at 0/1FDD8F0
2016-04-12 12:26:43 CEST : 0 LOG:  invalid record length at 0/20344B0
2016-04-12 12:26:43 CEST : 0 LOG:  redo done at 0/2034488
2016-04-12 12:26:43 CEST : 0 LOG:  last completed transaction was at log time 2016-04-12 12:26:01.704901+02


pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Next
From: Oleg Bartunov
Date:
Subject: Re: Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0