Thread: LOG: munmap(0x7fff80000000) failed: Invalid argument
Howdy.
I have an Ubuntu 14.04 LTS, configured with huge pages (4x1GB, disabled transparent) and a later kernel than what ships with 14.04.
root@ff2:~# uname -a
Linux ff2 3.16.7-ckt11-061515+ #1 SMP Mon Jun 15 18:47:13 CDT 2015 x86_64 x86_64 x86_64 GNU/Linux
It had postgresql-9.3 on it and I installed postgresql-9.5 via these steps:
aptitude remove -y postgresql-9.3
echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main 9.5" > /etc/apt/sources.list.d/postgresql.list
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 7FCC7D46ACCC4CF8
apt-get update
apt-get install -y postgresql-client-9.5 postgresql-common postgresql-9.5
At the end of the install, these were emitted to the screen:
Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ...
Creating new cluster 9.5/main ...
config /etc/postgresql/9.5/main
data /var/lib/postgresql/9.5/main
locale en_US.UTF-8
LOG: munmap(0x7fff80000000) failed: Invalid argument
[... snip 14 or so repeats ...]
LOG: munmap(0x7fff80000000) failed: Invalid argument
socket /var/run/postgresql
port 5433
update-alternatives: using /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
* Starting PostgreSQL 9.5 database server [ OK ]
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
I'm able to connect and I dumped a few default relations.
Is the munmap error of concern? It remains upon rebooting / restarting the server.
Thanks,
Chris
Chris Richards <chris@infinite.io> writes: > Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ... > Creating new cluster 9.5/main ... > config /etc/postgresql/9.5/main > data /var/lib/postgresql/9.5/main > locale en_US.UTF-8 > LOG: munmap(0x7fff80000000) failed: Invalid argument > [... snip 14 or so repeats ...] > LOG: munmap(0x7fff80000000) failed: Invalid argument > socket /var/run/postgresql > port 5433 > update-alternatives: using > /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide > /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode > * Starting PostgreSQL 9.5 database server > [ OK ] > Processing triggers for libc-bin (2.19-0ubuntu6.6) ... > I'm able to connect and I dumped a few default relations. > Is the munmap error of concern? It remains upon rebooting / restarting the > server. Seems pretty fishy to me; I don't know what would be causing it. [ digs in code... ] One theory is that PGSharedMemoryDetach is getting called more than once, but I'm not sure how that would happen. Can you characterize where this happens more precisely? What nondefault settings have you got in postgresql.conf? regards, tom lane
On 2016-10-10 18:21:48 -0400, Tom Lane wrote: > Chris Richards <chris@infinite.io> writes: > > Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ... > > Creating new cluster 9.5/main ... > > config /etc/postgresql/9.5/main > > data /var/lib/postgresql/9.5/main > > locale en_US.UTF-8 > > LOG: munmap(0x7fff80000000) failed: Invalid argument > > [... snip 14 or so repeats ...] > > LOG: munmap(0x7fff80000000) failed: Invalid argument > > socket /var/run/postgresql > > port 5433 > > update-alternatives: using > > /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide > > /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode > > * Starting PostgreSQL 9.5 database server > > [ OK ] > > Processing triggers for libc-bin (2.19-0ubuntu6.6) ... > > > I'm able to connect and I dumped a few default relations. > > > Is the munmap error of concern? It remains upon rebooting / restarting the > > server. > > Seems pretty fishy to me; I don't know what would be causing it. > > [ digs in code... ] One theory is that PGSharedMemoryDetach is getting > called more than once, but I'm not sure how that would happen. Can you > characterize where this happens more precisely? What nondefault settings > have you got in postgresql.conf? Hm. Could that be from the DSM code? - Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-10-10 18:21:48 -0400, Tom Lane wrote: >> Chris Richards <chris@infinite.io> writes: >>> LOG: munmap(0x7fff80000000) failed: Invalid argument >> [ digs in code... ] One theory is that PGSharedMemoryDetach is getting >> called more than once, but I'm not sure how that would happen. Can you >> characterize where this happens more precisely? What nondefault settings >> have you got in postgresql.conf? > Hm. Could that be from the DSM code? That particular error message spelling only appears in sysv_shmem.c, so it's not directly DSM's fault. The comments around PGSharedMemoryDetach strongly suggest that it ought to be a no-op if called twice, which it originally was but fails to be since the addition of the munmap call. So I'm *very* strongly tempted to add "AnonymousShmem = NULL;" there and in IpcMemoryDetach. But it's not evident why we've not seen this behavior many times already, so I'd kind of like to find out what's different about Chris's use-case before assuming that that will fix it. regards, tom lane
What more would you like--I'm happy to oblige? The upgrade steps I listed installed into a new directory, new conf file. I didn't even migrate my (skeletal) 9.3 database. Here's the 9.5 command-line. I've attached the referenced config file.
$ chrisr@ff1:~$ ps ax | grep bin/post
109999 ? S 0:00 /usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
Cheers,
Chris
On Mon, Oct 10, 2016 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Richards <chris@infinite.io> writes:
> Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ...
> Creating new cluster 9.5/main ...
> config /etc/postgresql/9.5/main
> data /var/lib/postgresql/9.5/main
> locale en_US.UTF-8
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> [... snip 14 or so repeats ...]
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> socket /var/run/postgresql
> port 5433
> update-alternatives: using
> /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide
> /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
> * Starting PostgreSQL 9.5 database server
> [ OK ]
> Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
> I'm able to connect and I dumped a few default relations.
> Is the munmap error of concern? It remains upon rebooting / restarting the
> server.
Seems pretty fishy to me; I don't know what would be causing it.
[ digs in code... ] One theory is that PGSharedMemoryDetach is getting
called more than once, but I'm not sure how that would happen. Can you
characterize where this happens more precisely? What nondefault settings
have you got in postgresql.conf?
regards, tom lane
Attachment
One extra bit, if this helps:
chrisr@ff1:~$ cat /proc/meminfo
MemTotal: 8163104 kB
MemFree: 2183692 kB
MemAvailable: 3648680 kB
Buffers: 170080 kB
Cached: 1231708 kB
SwapCached: 0 kB
Active: 1083596 kB
Inactive: 442312 kB
Active(anon): 125128 kB
Inactive(anon): 3584 kB
Active(file): 958468 kB
Inactive(file): 438728 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 499708 kB
SwapFree: 499708 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 124124 kB
Mapped: 42472 kB
Shmem: 4596 kB
Slab: 144788 kB
SReclaimable: 97612 kB
SUnreclaim: 47176 kB
KernelStack: 10672 kB
PageTables: 6332 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2484108 kB
Committed_AS: 900632 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 202276 kB
VmallocChunk: 34359464540 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 4
HugePages_Free: 3
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
DirectMap4k: 83776 kB
DirectMap2M: 4110336 kB
DirectMap1G: 6291456 kB
On Mon, Oct 10, 2016 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Richards <chris@infinite.io> writes:
> Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ...
> Creating new cluster 9.5/main ...
> config /etc/postgresql/9.5/main
> data /var/lib/postgresql/9.5/main
> locale en_US.UTF-8
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> [... snip 14 or so repeats ...]
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> socket /var/run/postgresql
> port 5433
> update-alternatives: using
> /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide
> /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
> * Starting PostgreSQL 9.5 database server
> [ OK ]
> Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
> I'm able to connect and I dumped a few default relations.
> Is the munmap error of concern? It remains upon rebooting / restarting the
> server.
Seems pretty fishy to me; I don't know what would be causing it.
[ digs in code... ] One theory is that PGSharedMemoryDetach is getting
called more than once, but I'm not sure how that would happen. Can you
characterize where this happens more precisely? What nondefault settings
have you got in postgresql.conf?
regards, tom lane
Oh the email spam :P
I did another reboot test to validate the error. It does exist; it does not spam munmap error like it did with the aptitude install.
I then marked the log file, shutdown (1 munmap during), marked it again and then started it (1 munmap during).
MARK PRE-SHUTDOWN
2016-10-11 20:02:45 UTC [1656-2] LOG: received fast shutdown request
2016-10-11 20:02:45 UTC [1656-3] LOG: aborting any active transactions
2016-10-11 20:02:45 UTC [1707-2] LOG: autovacuum launcher shutting down
2016-10-11 20:02:45 UTC [1704-1] LOG: shutting down
2016-10-11 20:02:45 UTC [1704-2] LOG: database system is shut down
2016-10-11 20:02:45 UTC [1656-4] LOG: munmap(0x7fff80000000) failed: Invalid argument
MARK PRE-START
2016-10-11 20:03:02 UTC [9894-1] LOG: database system was shut down at 2016-10-11 20:02:45 UTC
2016-10-11 20:03:02 UTC [9894-2] LOG: MultiXact member wraparound protections are now enabled
2016-10-11 20:03:02 UTC [9893-1] LOG: database system is ready to accept connections
2016-10-11 20:03:02 UTC [9898-1] LOG: autovacuum launcher started
2016-10-11 20:03:02 UTC [9899-1] LOG: munmap(0x7fff80000000) failed: Invalid argument
2016-10-11 20:03:02 UTC [9900-1] [unknown]@[unknown] LOG: incomplete startup packet
On Mon, Oct 10, 2016 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Richards <chris@infinite.io> writes:
> Setting up postgresql-9.5 (9.5.4-1.pgdg14.04+2) ...
> Creating new cluster 9.5/main ...
> config /etc/postgresql/9.5/main
> data /var/lib/postgresql/9.5/main
> locale en_US.UTF-8
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> [... snip 14 or so repeats ...]
> LOG: munmap(0x7fff80000000) failed: Invalid argument
> socket /var/run/postgresql
> port 5433
> update-alternatives: using
> /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide
> /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
> * Starting PostgreSQL 9.5 database server
> [ OK ]
> Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
> I'm able to connect and I dumped a few default relations.
> Is the munmap error of concern? It remains upon rebooting / restarting the
> server.
Seems pretty fishy to me; I don't know what would be causing it.
[ digs in code... ] One theory is that PGSharedMemoryDetach is getting
called more than once, but I'm not sure how that would happen. Can you
characterize where this happens more precisely? What nondefault settings
have you got in postgresql.conf?
regards, tom lane
Chris Richards <chris@infinite.io> writes: > Oh the email spam :P No problem. I'm starting to think that this isn't actually a Postgres bug, but rather something funny with your kernel. The code in sysv_shmem is certainly as fragile as can be, but I'm darned if I can find an existing code path that would actually break it --- and the lack of other complaints is pointing towards something funny with your system, too. First off, does changing "huge_pages" in postgresql.conf alter the behavior? (It defaults to "try", see what happens with "on" or "off".) Second, I notice that you've got > Hugepagesize: 1048576 kB which is just enormous. PG is only going to request circa 140MB given the settings you mentioned. We've seen reports of kernel bugs that cause mmap() to fail for requests that aren't a multiple of the hugepage size, but I've not heard that mmap() might succeed and then munmap() fail. That seems like what's happening to you though. regards, tom lane
I have another application that consumes all of the huge pages; they aren't for pgsql. :) I've modified the configuration file from "try" to "off" and munmap is no more. Mischief managed.
Thanks for your help.
On Tue, Oct 11, 2016 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Richards <chris@infinite.io> writes:
> Oh the email spam :P
No problem. I'm starting to think that this isn't actually a Postgres
bug, but rather something funny with your kernel. The code in sysv_shmem
is certainly as fragile as can be, but I'm darned if I can find an
existing code path that would actually break it --- and the lack of other
complaints is pointing towards something funny with your system, too.
First off, does changing "huge_pages" in postgresql.conf alter the
behavior? (It defaults to "try", see what happens with "on" or "off".)
Second, I notice that you've got
> Hugepagesize: 1048576 kB
which is just enormous. PG is only going to request circa 140MB given
the settings you mentioned. We've seen reports of kernel bugs that cause
mmap() to fail for requests that aren't a multiple of the hugepage size,
but I've not heard that mmap() might succeed and then munmap() fail.
That seems like what's happening to you though.
regards, tom lane