Thread: BUG #2406: Not all systems support SHM_SHARE_MMU
The following bug has been logged online: Bug reference: 2406 Logged by: Paul van der Zwan Email address: paul.vanderzwan@sun.com PostgreSQL version: 8.1.3 Operating system: Solaris Description: Not all systems support SHM_SHARE_MMU Details: Only systems with large pagesizes support ISM, so always defining #define PG_SHMAT_FLAGS SHM_SHARE_MMU in src/backend/port/sysv_shmem.c will cause all calls to shmat to fail with EINVAL on systems that do not support large pages. The following may be a better check: #if def SHM_SHARE_MMU #define PG_SHMAT_FLAGS ((getpagesizes(0,NULL)>1)?SHM_SHARE_MMU:0) #else #define PG_SHMAT_FLAGS 0 #endif This problem manifested itself on a VIA Mini ITX system and Solaris Nevada ( build 36) Paul van der Zwan
"Paul van der Zwan" <paul.vanderzwan@sun.com> writes: > Only systems with large pagesizes support ISM, so always defining > #define PG_SHMAT_FLAGS SHM_SHARE_MMU > in src/backend/port/sysv_shmem.c will cause all calls to shmat to fail with > EINVAL on systems that do not support large pages. That code's been in there since PG 7.3, and no one before you has complained. Are you sure you've identified the problem correctly? regards, tom lane
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: > Maybe noone ever ran Postgres on Solaris on a VIA Epia system. Maybe. What is a "VIA Epia system"? Frankly, I'm afraid that your patch is likely to break way more systems than it fixes. What is getpagesizes(), and is it guaranteed to exist on *every* Solaris system? What the heck correlation does its result have to whether SHM_SHARE_MMU will work? regards, tom lane
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: > AFAIK getpagesizes() appeared in 2001 so that probably means it is > missing in anything before Solaris 9. We could handle this without relying on getpagesizes() by just trying and falling back: #ifdef SHM_SHARE_MMU memAddress = shmat(shmid, addr, SHM_SHARE_MMU); if (memAddress == (void *) -1 && errno == EINVAL) memAddress = shmat(shmid, addr, 0); #else memAddress = shmat(shmid, addr, 0); #endif However, I would argue that a system is pretty broken if it exposes the SHM_SHARE_MMU #define and then rejects it at runtime. > I'll see if I can get the x86 experts here to have a look at it... I think either Solaris/x86 should not expose this #define, or it should silently ignore the bit at runtime. AFAICS, SHM_SHARE_MMU has no guaranteed semantic effect anyway, it's just a performance hint; so ignoring it on platforms that can't handle it is reasonable. regards, tom lane
On 25-apr-2006, at 7:48, Tom Lane wrote: > "Paul van der Zwan" <paul.vanderzwan@sun.com> writes: >> Only systems with large pagesizes support ISM, so always defining >> #define PG_SHMAT_FLAGS SHM_SHARE_MMU >> in src/backend/port/sysv_shmem.c will cause all calls to shmat to >> fail with >> EINVAL on systems that do not support large pages. > > That code's been in there since PG 7.3, and no one before you has > complained. Are you sure you've identified the problem correctly? > > regards, tom lane I am 99% sure that is the cause. If I put shmsys:ism_off=1 in /etc/ system it ignores the SHM_SHARE_MMU flag and it works. Maybe noone ever ran Postgres on Solaris on a VIA Epia system. I haven't rebuilt postgres with my suggested patch (yet) so that's were the 1% doubt comes in. I'll try to do that sometime this week. Paul
On 25-apr-2006, at 9:08, Tom Lane wrote: > Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: >> Maybe noone ever ran Postgres on Solaris on a VIA Epia system. > > Maybe. What is a "VIA Epia system"? > VIA is a hardware manufacturer who make small, low power boards with their own X86 compatible cpu on it, you can find more about it on : http://www.via.com.tw/en/ products/mainboards/mini_itx/epia/index.jsp > Frankly, I'm afraid that your patch is likely to break way more > systems > than it fixes. What is getpagesizes(), and is it guaranteed to > exist on > *every* Solaris system? What the heck correlation does its result > have > to whether SHM_SHARE_MMU will work? AFAIK getpagesizes() appeared in 2001 so that probably means it is missing in anything before Solaris 9. If you look at line 308 of http://cvs.opensolaris.org/source/xref/on/ usr/src/uts/common/os/shm.c you'll see that shmat return EINVAL if only one pagesize is available. Which is what happens on my system, and possibly also on older (32 bit pre Ultra ) Sparc systems. My guess is that all UltraSparce and 'modern' x86/amd64 cpu's support large pages and therefor will n ever hit this failure mode of shmat(). I'll see if I can get the x86 experts here to have a look at it... Regards Paul
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: > On 25-apr-2006, at 16:46, Tom Lane wrote: >> AFAICS, SHM_SHARE_MMU has no >> guaranteed semantic effect anyway, it's just a performance hint; so >> ignoring it on platforms that can't handle it is reasonable. >> > I disagree, I have no definite info why it is a hard failure, > probably because > there is no way to communicate to the app that it's request is > ignored. Which applications do you think will do anything except exactly what you are proposing we do, ie, just redo the call without the flag bit? Why are you going to make every application jump through this hoop in order to cope with a (possibly temporary) inadequacy in some seldom-used versions of Solaris? We'll probably put in the kluge because we have no other choice, but I strongly disagree that it's our problem. regards, tom lane
On 25-apr-2006, at 16:46, Tom Lane wrote: > Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: >> AFAIK getpagesizes() appeared in 2001 so that probably means it is >> missing in anything before Solaris 9. > > We could handle this without relying on getpagesizes() by just trying > and falling back: > > #ifdef SHM_SHARE_MMU > memAddress = shmat(shmid, addr, SHM_SHARE_MMU); > if (memAddress == (void *) -1 && errno == EINVAL) > memAddress = shmat(shmid, addr, 0); > #else > memAddress = shmat(shmid, addr, 0); > #endif > That would be a clean solution ( and was suggested by some of my colleagues as well) > However, I would argue that a system is pretty broken if it exposes > the > SHM_SHARE_MMU #define and then rejects it at runtime. It is just a define, the fact that this define exists has nothing to do with it having any meaning. It's not like a HAVE_ISM flag. shmat() can fail for a number of reasons, one of them is not having ISM available on the current system. > >> I'll see if I can get the x86 experts here to have a look at it... > > I think either Solaris/x86 should not expose this #define, or it > should > silently ignore the bit at runtime. AFAICS, SHM_SHARE_MMU has no > guaranteed semantic effect anyway, it's just a performance hint; so > ignoring it on platforms that can't handle it is reasonable. > I disagree, I have no definite info why it is a hard failure, probably because there is no way to communicate to the app that it's request is ignored. System calls either fail or succeed. And introducing a new errno value just for this is overkill, I guess. > regards, tom lane Regards Paul
On 25-apr-2006, at 20:34, Tom Lane wrote: > Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes: >> On 25-apr-2006, at 16:46, Tom Lane wrote: >>> AFAICS, SHM_SHARE_MMU has no >>> guaranteed semantic effect anyway, it's just a performance hint; so >>> ignoring it on platforms that can't handle it is reasonable. >>> >> I disagree, I have no definite info why it is a hard failure, >> probably because >> there is no way to communicate to the app that it's request is >> ignored. > > Which applications do you think will do anything except exactly > what you > are proposing we do, ie, just redo the call without the flag bit? Why > are you going to make every application jump through this hoop in > order > to cope with a (possibly temporary) inadequacy in some seldom-used > versions of Solaris? > > We'll probably put in the kluge because we have no other choice, but > I strongly disagree that it's our problem. I think I have to make something clear, I am not part of the Solaris Engineering group and even though I work for Sun I personally have probably less influence on Solaris than a customer. What I wrote/write is my personal opinion and I should insert the usual disclaimer about me not 'officially' representing Sun Microsystems . I personally do believe that silently failing or ignoring something an application asks for explicitely is bad, if the application wants it and does not get it, the OS should communicate this to the application. I feel it is up to the application and not to the OS to decide how to respond when the request fails. It may be true that all or most applications will just redo it, or they may do something else because ISM is not present, to be honest I do not know. The code you suggested is IMHO a clean way to ask for an optimization and gracefully accept the denial and continue without it. My guess is the absence of ISM on the VIA cpu is purely a hardware issue and not related to a 'seldom used version of Solaris' as there are no different versions of Solaris, only different releases. If the hardware does not support something it may be difficult or impossible for an OS to implement a feature. It would be nice though if every CPU supports the large pages so the failure would never happen. Regards Paul