Thread: BUG #2406: Not all systems support SHM_SHARE_MMU

BUG #2406: Not all systems support SHM_SHARE_MMU

From
"Paul van der Zwan"
Date:
The following bug has been logged online:

Bug reference:      2406
Logged by:          Paul van der Zwan
Email address:      paul.vanderzwan@sun.com
PostgreSQL version: 8.1.3
Operating system:   Solaris
Description:        Not all systems support SHM_SHARE_MMU
Details:

Only systems with large pagesizes support ISM, so always defining
#define PG_SHMAT_FLAGS                  SHM_SHARE_MMU
in src/backend/port/sysv_shmem.c  will cause all calls to shmat to fail with
EINVAL on systems that do not support large pages.
The following may be a better check:
#if def SHM_SHARE_MMU
#define PG_SHMAT_FLAGS ((getpagesizes(0,NULL)>1)?SHM_SHARE_MMU:0)
#else
#define PG_SHMAT_FLAGS 0
#endif

This problem manifested itself on a VIA Mini ITX system and Solaris Nevada (
build 36)

 Paul van der Zwan

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Tom Lane
Date:
"Paul van der Zwan" <paul.vanderzwan@sun.com> writes:
> Only systems with large pagesizes support ISM, so always defining
> #define PG_SHMAT_FLAGS                  SHM_SHARE_MMU
> in src/backend/port/sysv_shmem.c  will cause all calls to shmat to fail with
> EINVAL on systems that do not support large pages.

That code's been in there since PG 7.3, and no one before you has
complained.  Are you sure you've identified the problem correctly?

            regards, tom lane

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Tom Lane
Date:
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
> Maybe noone ever ran Postgres on Solaris on a VIA Epia system.

Maybe.  What is a "VIA Epia system"?

Frankly, I'm afraid that your patch is likely to break way more systems
than it fixes.  What is getpagesizes(), and is it guaranteed to exist on
*every* Solaris system?  What the heck correlation does its result have
to whether SHM_SHARE_MMU will work?

            regards, tom lane

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Tom Lane
Date:
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
> AFAIK getpagesizes() appeared in 2001 so that probably means it is
> missing in anything before Solaris 9.

We could handle this without relying on getpagesizes() by just trying
and falling back:

#ifdef SHM_SHARE_MMU
    memAddress = shmat(shmid, addr, SHM_SHARE_MMU);
    if (memAddress == (void *) -1 && errno == EINVAL)
        memAddress = shmat(shmid, addr, 0);
#else
    memAddress = shmat(shmid, addr, 0);
#endif

However, I would argue that a system is pretty broken if it exposes the
SHM_SHARE_MMU #define and then rejects it at runtime.

> I'll see if I can get the x86 experts here to have a look at it...

I think either Solaris/x86 should not expose this #define, or it should
silently ignore the bit at runtime.  AFAICS, SHM_SHARE_MMU has no
guaranteed semantic effect anyway, it's just a performance hint; so
ignoring it on platforms that can't handle it is reasonable.

            regards, tom lane

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Paul van der Zwan
Date:
On 25-apr-2006, at 7:48, Tom Lane wrote:

> "Paul van der Zwan" <paul.vanderzwan@sun.com> writes:
>> Only systems with large pagesizes support ISM, so always defining
>> #define PG_SHMAT_FLAGS                  SHM_SHARE_MMU
>> in src/backend/port/sysv_shmem.c  will cause all calls to shmat to
>> fail with
>> EINVAL on systems that do not support large pages.
>
> That code's been in there since PG 7.3, and no one before you has
> complained.  Are you sure you've identified the problem correctly?
>
>             regards, tom lane

I am 99% sure that is the cause. If I put shmsys:ism_off=1 in /etc/
system
it ignores the SHM_SHARE_MMU flag and it works.
Maybe noone ever ran Postgres on Solaris on a VIA Epia system.
I haven't rebuilt postgres with my suggested patch (yet) so that's
were the 1% doubt comes in.
I'll try to do that sometime this week.


    Paul

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Paul van der Zwan
Date:
On 25-apr-2006, at 9:08, Tom Lane wrote:

> Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
>> Maybe noone ever ran Postgres on Solaris on a VIA Epia system.
>
> Maybe.  What is a "VIA Epia system"?
>

VIA is a hardware manufacturer who make small, low power boards with
their own X86 compatible cpu
on it, you can find more about it on : http://www.via.com.tw/en/
products/mainboards/mini_itx/epia/index.jsp

> Frankly, I'm afraid that your patch is likely to break way more
> systems
> than it fixes.  What is getpagesizes(), and is it guaranteed to
> exist on
> *every* Solaris system?  What the heck correlation does its result
> have
> to whether SHM_SHARE_MMU will work?

AFAIK getpagesizes() appeared in 2001 so that probably means it is
missing in anything before
Solaris 9.

If you look at line 308 of http://cvs.opensolaris.org/source/xref/on/
usr/src/uts/common/os/shm.c
you'll see that shmat return EINVAL if only one pagesize is available.
Which is what happens on my  system, and possibly also on older (32
bit pre Ultra ) Sparc systems.

My guess is that all UltraSparce and 'modern' x86/amd64 cpu's support
large pages and therefor will n
ever hit this failure mode of shmat().
I'll see if I can get the x86 experts here to have a look at it...

    Regards

            Paul

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Tom Lane
Date:
Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
> On 25-apr-2006, at 16:46, Tom Lane wrote:
>> AFAICS, SHM_SHARE_MMU has no
>> guaranteed semantic effect anyway, it's just a performance hint; so
>> ignoring it on platforms that can't handle it is reasonable.
>>
> I disagree, I have no definite info  why it is a hard failure,
> probably because
> there is no way to communicate to the app that it's request is
> ignored.

Which applications do you think will do anything except exactly what you
are proposing we do, ie, just redo the call without the flag bit?  Why
are you going to make every application jump through this hoop in order
to cope with a (possibly temporary) inadequacy in some seldom-used
versions of Solaris?

We'll probably put in the kluge because we have no other choice, but
I strongly disagree that it's our problem.

            regards, tom lane

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Paul van der Zwan
Date:
On 25-apr-2006, at 16:46, Tom Lane wrote:

> Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
>> AFAIK getpagesizes() appeared in 2001 so that probably means it is
>> missing in anything before Solaris 9.
>
> We could handle this without relying on getpagesizes() by just trying
> and falling back:
>
> #ifdef SHM_SHARE_MMU
>     memAddress = shmat(shmid, addr, SHM_SHARE_MMU);
>     if (memAddress == (void *) -1 && errno == EINVAL)
>         memAddress = shmat(shmid, addr, 0);
> #else
>     memAddress = shmat(shmid, addr, 0);
> #endif
>
That would be a clean solution ( and was suggested by some of my
colleagues as well)

> However, I would argue that a system is pretty broken if it exposes
> the
> SHM_SHARE_MMU #define and then rejects it at runtime.

It is just a define, the fact that this define exists has nothing to
do with it having
any meaning. It's not like a HAVE_ISM flag. shmat() can fail for a
number of reasons, one of
them is not having ISM available on the current system.

>
>> I'll see if I can get the x86 experts here to have a look at it...
>
> I think either Solaris/x86 should not expose this #define, or it
> should
> silently ignore the bit at runtime.  AFAICS, SHM_SHARE_MMU has no
> guaranteed semantic effect anyway, it's just a performance hint; so
> ignoring it on platforms that can't handle it is reasonable.
>
I disagree, I have no definite info  why it is a hard failure,
probably because
there is no way to communicate to the app that it's request is
ignored. System calls
either fail or succeed. And introducing a new errno value just for
this is overkill, I guess.
>             regards, tom lane

Regards
    Paul

Re: BUG #2406: Not all systems support SHM_SHARE_MMU

From
Paul van der Zwan
Date:
On 25-apr-2006, at 20:34, Tom Lane wrote:

> Paul van der Zwan <Paul.Vanderzwan@Sun.COM> writes:
>> On 25-apr-2006, at 16:46, Tom Lane wrote:
>>> AFAICS, SHM_SHARE_MMU has no
>>> guaranteed semantic effect anyway, it's just a performance hint; so
>>> ignoring it on platforms that can't handle it is reasonable.
>>>
>> I disagree, I have no definite info  why it is a hard failure,
>> probably because
>> there is no way to communicate to the app that it's request is
>> ignored.
>
> Which applications do you think will do anything except exactly
> what you
> are proposing we do, ie, just redo the call without the flag bit?  Why
> are you going to make every application jump through this hoop in
> order
> to cope with a (possibly temporary) inadequacy in some seldom-used
> versions of Solaris?
>
> We'll probably put in the kluge because we have no other choice, but
> I strongly disagree that it's our problem.


I think I have to make something clear, I am not part of the Solaris
Engineering group
and even though I work for Sun I personally have probably less
influence on Solaris than
a customer. What I wrote/write is my personal opinion and  I should
insert the usual
disclaimer about me not 'officially' representing Sun Microsystems .

I personally do believe that silently failing or ignoring something
an application asks for explicitely
is bad, if the application wants it and does not get it, the OS
should communicate this to
the application.
I feel it is  up to the application and not to the OS to decide how
to respond when the request fails.
It may be true that all or most applications will just redo it, or
they may do something else
because ISM is not present, to be honest I do not know.
The code you suggested is IMHO a clean way to ask for an optimization
and gracefully accept the denial
and continue without it.

My guess is  the absence of ISM on the VIA cpu is purely a hardware
issue and not related to a 'seldom used
version of Solaris' as there are no different versions of Solaris,
only different releases. If the hardware
does not support something it may be difficult or impossible for an
OS to implement a feature. It would be
nice though if every CPU supports the  large pages so the failure
would never happen.


Regards
     Paul