Thread: Need help with phys backed shm segments (Postgresql+FreeBSD).

Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Alfred Perlstein
Date:
On FreeBSD 4.1.1 and above there's a sysctl tunable called
kern.ipc.shm_use_phys, when set to 1 it's supposed to
make the kernel's handling of shared memory much more
effecient at the expense or making the shm segment unpageable.

I tried to use this option with 7.0.3 and FreeBSD 4.2 but
for some reason spinlocks keep getting mucked up (there's
a log at the tail end of this message).

Anyone using Postgresql on FreeBSD probably wants this to work,
otherwise using extremely large chunks of shm and many backends
active can exhaust kernel memory.

I was wondering if any of the more experienced developers could
take a look at what's happenening here.

Here's the log, the number in parens is the address of the lock,
on tas() the value printed to the right is the value in _ret,
for the others, it's the value before the lock count is set.

S_INIT_LOCK: (0x30048008) -> 0
S_UNLOCK: (0x30048008) -> 0
S_INIT_LOCK: (0x3004800c) -> 0
S_UNLOCK: (0x3004800c) -> 0
S_INIT_LOCK: (0x30048010) -> 0
S_UNLOCK: (0x30048010) -> 0
S_INIT_LOCK: (0x30048011) -> 0
S_UNLOCK: (0x30048011) -> 0
S_INIT_LOCK: (0x30048012) -> 0
S_UNLOCK: (0x30048012) -> 0
S_INIT_LOCK: (0x30048018) -> 0
S_UNLOCK: (0x30048018) -> 0
S_INIT_LOCK: (0x3004801c) -> 0
S_UNLOCK: (0x3004801c) -> 0
S_INIT_LOCK: (0x3004801d) -> 1
S_UNLOCK: (0x3004801d) -> 1
S_INIT_LOCK: (0x3004801e) -> 0
S_UNLOCK: (0x3004801e) -> 0
S_INIT_LOCK: (0x30048024) -> 127
S_UNLOCK: (0x30048024) -> 127
S_INIT_LOCK: (0x30048028) -> 255
S_UNLOCK: (0x30048028) -> 255
S_INIT_LOCK: (0x30048029) -> 0
S_UNLOCK: (0x30048029) -> 0
S_INIT_LOCK: (0x3004802a) -> 0
S_UNLOCK: (0x3004802a) -> 0
S_INIT_LOCK: (0x30048030) -> 1
S_UNLOCK: (0x30048030) -> 1
S_INIT_LOCK: (0x30048034) -> 0
S_UNLOCK: (0x30048034) -> 0
S_INIT_LOCK: (0x30048035) -> 0
S_UNLOCK: (0x30048035) -> 0
S_INIT_LOCK: (0x30048036) -> 0
S_UNLOCK: (0x30048036) -> 0
S_INIT_LOCK: (0x3004803c) -> 50
S_UNLOCK: (0x3004803c) -> 50
S_INIT_LOCK: (0x30048040) -> 10
S_UNLOCK: (0x30048040) -> 10
S_INIT_LOCK: (0x30048041) -> 0
S_UNLOCK: (0x30048041) -> 0
S_INIT_LOCK: (0x30048042) -> 0
S_UNLOCK: (0x30048042) -> 0
S_INIT_LOCK: (0x30048048) -> 1
S_UNLOCK: (0x30048048) -> 1
S_INIT_LOCK: (0x3004804c) -> 80
S_UNLOCK: (0x3004804c) -> 80
S_INIT_LOCK: (0x3004804d) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_INIT_LOCK: (0x3004804e) -> 0
S_UNLOCK: (0x3004804e) -> 0
S_INIT_LOCK: (0x30048054) -> 0
S_UNLOCK: (0x30048054) -> 0
S_INIT_LOCK: (0x30048058) -> 1
S_UNLOCK: (0x30048058) -> 1
S_INIT_LOCK: (0x30048059) -> 1
S_UNLOCK: (0x30048059) -> 1
S_INIT_LOCK: (0x3004805a) -> 0
S_UNLOCK: (0x3004805a) -> 0
S_INIT_LOCK: (0x30048060) -> 0
S_UNLOCK: (0x30048060) -> 0
S_INIT_LOCK: (0x30048064) -> 0
S_UNLOCK: (0x30048064) -> 0
S_INIT_LOCK: (0x30048065) -> 0
S_UNLOCK: (0x30048065) -> 0
S_INIT_LOCK: (0x30048066) -> 0
S_UNLOCK: (0x30048066) -> 0
S_INIT_LOCK: (0x3004806c) -> 0
S_UNLOCK: (0x3004806c) -> 0
S_INIT_LOCK: (0x30048070) -> 0
S_UNLOCK: (0x30048070) -> 0
S_INIT_LOCK: (0x30048071) -> 0
S_UNLOCK: (0x30048071) -> 0
S_INIT_LOCK: (0x30048072) -> 0
S_UNLOCK: (0x30048072) -> 0
S_INIT_LOCK: (0x30048078) -> 0
S_UNLOCK: (0x30048078) -> 0
S_INIT_LOCK: (0x3004807c) -> 0
S_UNLOCK: (0x3004807c) -> 0
S_INIT_LOCK: (0x3004807d) -> 0
S_UNLOCK: (0x3004807d) -> 0
S_INIT_LOCK: (0x3004807e) -> 0
S_UNLOCK: (0x3004807e) -> 0
tas (0x30048054) -> 0
tas (0x30048059) -> 0
tas (0x30048058) -> 0
S_UNLOCK: (0x30048054) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 0
tas (0x3004804c) -> 0
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
S_UNLOCK: (0x3004804c) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 0
tas (0x3004804c) -> 0
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
S_UNLOCK: (0x3004804c) -> 1
S_UNLOCK: (0x3004804d) -> 1
S_UNLOCK: (0x30048048) -> 1
tas (0x30048048) -> 0
tas (0x3004804d) -> 4
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1
tas (0x3004804d) -> 1

repeats (it's stuck)


-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Tom Lane
Date:
Alfred Perlstein <bright@wintelcom.net> writes:
> Here's the log, the number in parens is the address of the lock,
> on tas() the value printed to the right is the value in _ret,
> for the others, it's the value before the lock count is set.

This looks to be the trace of a SpinAcquire()
(see src/backend/storage/ipc/spin.c):

> tas (0x30048048) -> 0
> tas (0x3004804d) -> 0
> tas (0x3004804c) -> 0
> S_UNLOCK: (0x30048048) -> 1

followed by SpinRelease():

> tas (0x30048048) -> 0
> S_UNLOCK: (0x3004804c) -> 1
> S_UNLOCK: (0x3004804d) -> 1
> S_UNLOCK: (0x30048048) -> 1

followed by a failed attempt to reacquire the same SLock:

> tas (0x30048048) -> 0
> tas (0x3004804d) -> 4
> tas (0x3004804d) -> 1
> tas (0x3004804d) -> 1
> tas (0x3004804d) -> 1
> tas (0x3004804d) -> 1

And that looks completely broken :-( ... something's clobbered the
exlock field of the SLock struct, apparently.  Are you sure this
kernel feature you're trying to use actually works?

BTW, if you're wondering why an SLock needs to contain *three*
hardware spinlocks, the answer is that it doesn't.  This code has
been greatly simplified in current sources...
        regards, tom lane


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [001205 07:43] wrote:
> Alfred Perlstein <bright@wintelcom.net> writes:
> > Here's the log, the number in parens is the address of the lock,
> > on tas() the value printed to the right is the value in _ret,
> > for the others, it's the value before the lock count is set.
> 
> This looks to be the trace of a SpinAcquire()
> (see src/backend/storage/ipc/spin.c):

Yes, those are my debug printfs :).

> > tas (0x30048048) -> 0
> > tas (0x3004804d) -> 0
> > tas (0x3004804c) -> 0
> > S_UNLOCK: (0x30048048) -> 1
> 
> followed by SpinRelease():
> 
> > tas (0x30048048) -> 0
> > S_UNLOCK: (0x3004804c) -> 1
> > S_UNLOCK: (0x3004804d) -> 1
> > S_UNLOCK: (0x30048048) -> 1
> 
> followed by a failed attempt to reacquire the same SLock:
> 
> > tas (0x30048048) -> 0
> > tas (0x3004804d) -> 4
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> > tas (0x3004804d) -> 1
> 
> And that looks completely broken :-( ... something's clobbered the
> exlock field of the SLock struct, apparently.  Are you sure this
> kernel feature you're trying to use actually works?

No I'm not sure actually. :)  I'll look into it further, but I
was wondering if there was something I could do to debug the
locks better.  I think I'll add some S_MAGIC or something in
the struct to see if the whole thing is getting clobbered or
what...  If you have any suggestions let me know.

> BTW, if you're wondering why an SLock needs to contain *three*
> hardware spinlocks, the answer is that it doesn't.  This code has
> been greatly simplified in current sources...

It did look a bit strange...

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Tom Lane
Date:
BTW, I just remembered that in 7.0.*, the SLocks that are managed by
SpinAcquire() all live in their own little shm segment.  On a machine
where slock_t is char, it'd likely only amount to 128 bytes or so.
Maybe you are seeing some bug in FreeBSD's handling of tiny shm
segments?
        regards, tom lane


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Tom Lane
Date:
Alfred Perlstein <bright@wintelcom.net> writes:
> No I'm not sure actually. :)  I'll look into it further, but I
> was wondering if there was something I could do to debug the
> locks better.  I think I'll add some S_MAGIC or something in
> the struct to see if the whole thing is getting clobbered or
> what...  If you have any suggestions let me know.

Seems like a plan.  In current sources I have moved the SLock struct
declaration out of header files and into spin.c; it doesn't really
need to be known anywhere else.  You could probably do the same in
7.0.*, which would greatly simplify changing the struct around to
see what's happening.
        regards, tom lane


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [001205 08:37] wrote:
> BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> SpinAcquire() all live in their own little shm segment.  On a machine
> where slock_t is char, it'd likely only amount to 128 bytes or so.
> Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> segments?

Good call, i think I found it! :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Alfred Perlstein
Date:
* Alfred Perlstein <bright@wintelcom.net> [001205 12:30] wrote:
> * Tom Lane <tgl@sss.pgh.pa.us> [001205 08:37] wrote:
> > BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> > SpinAcquire() all live in their own little shm segment.  On a machine
> > where slock_t is char, it'd likely only amount to 128 bytes or so.
> > Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> > segments?
> 
> Good call, i think I found it! :)

Here's the patch I'm using on FreeBSD, it seems to work, if any
other FreeBSD'ers want to try it out, just apply the patch:
cd /usr/src/sys/vm ; patch < patchfile

and recompile and boot with a new kernel, then do this:

sysctl -w kern.ipc.shm_use_phys=1

or add:
kern.ipc.shm_use_phys=1 
to /etc/sysctl.conf

Let me know if it works.

thanks,
-Alfred

Index: phys_pager.c
===================================================================
RCS file: /home/ncvs/src/sys/vm/phys_pager.c,v
retrieving revision 1.3.2.1
diff -u -u -r1.3.2.1 phys_pager.c
--- phys_pager.c    2000/08/04 22:31:11    1.3.2.1
+++ phys_pager.c    2000/12/05 20:13:25
@@ -83,7 +83,7 @@         * Allocate object and associate it with the pager.         */        object =
vm_object_allocate(OBJT_PHYS,
-            OFF_TO_IDX(foff + size));
+            OFF_TO_IDX(foff + PAGE_MASK + size));        object->handle = handle;
TAILQ_INSERT_TAIL(&phys_pager_object_list,object,            pager_object_list);
 


Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Oleg Bartunov
Date:
Alfred,

do you have any numbers with and without your patch ?
I mean performance. You may use pg_check utility.
Oleg
On Tue, 5 Dec 2000, Alfred Perlstein wrote:

> Date: Tue, 5 Dec 2000 13:04:45 -0800
> From: Alfred Perlstein <bright@wintelcom.net>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).
> 
> * Alfred Perlstein <bright@wintelcom.net> [001205 12:30] wrote:
> > * Tom Lane <tgl@sss.pgh.pa.us> [001205 08:37] wrote:
> > > BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> > > SpinAcquire() all live in their own little shm segment.  On a machine
> > > where slock_t is char, it'd likely only amount to 128 bytes or so.
> > > Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> > > segments?
> > 
> > Good call, i think I found it! :)
> 
> Here's the patch I'm using on FreeBSD, it seems to work, if any
> other FreeBSD'ers want to try it out, just apply the patch:
> cd /usr/src/sys/vm ; patch < patchfile
> 
> and recompile and boot with a new kernel, then do this:
> 
> sysctl -w kern.ipc.shm_use_phys=1
> 
> or add:
> kern.ipc.shm_use_phys=1 
> to /etc/sysctl.conf
> 
> Let me know if it works.
> 
> thanks,
> -Alfred
> 
> Index: phys_pager.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/vm/phys_pager.c,v
> retrieving revision 1.3.2.1
> diff -u -u -r1.3.2.1 phys_pager.c
> --- phys_pager.c    2000/08/04 22:31:11    1.3.2.1
> +++ phys_pager.c    2000/12/05 20:13:25
> @@ -83,7 +83,7 @@
>           * Allocate object and associate it with the pager.
>           */
>          object = vm_object_allocate(OBJT_PHYS,
> -            OFF_TO_IDX(foff + size));
> +            OFF_TO_IDX(foff + PAGE_MASK + size));
>          object->handle = handle;
>          TAILQ_INSERT_TAIL(&phys_pager_object_list, object,
>              pager_object_list);
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Randy Jonasz
Date:
Just as interesting

On Tue, 5 Dec 2000, Alfred Perlstein wrote:

> * Alfred Perlstein <bright@wintelcom.net> [001205 12:30] wrote:
> > * Tom Lane <tgl@sss.pgh.pa.us> [001205 08:37] wrote:
> > > BTW, I just remembered that in 7.0.*, the SLocks that are managed by
> > > SpinAcquire() all live in their own little shm segment.  On a machine
> > > where slock_t is char, it'd likely only amount to 128 bytes or so.
> > > Maybe you are seeing some bug in FreeBSD's handling of tiny shm
> > > segments?
> >
> > Good call, i think I found it! :)
>
> Here's the patch I'm using on FreeBSD, it seems to work, if any
> other FreeBSD'ers want to try it out, just apply the patch:
> cd /usr/src/sys/vm ; patch < patchfile
>
> and recompile and boot with a new kernel, then do this:
>
> sysctl -w kern.ipc.shm_use_phys=1
>
> or add:
> kern.ipc.shm_use_phys=1
> to /etc/sysctl.conf
>
> Let me know if it works.
>
> thanks,
> -Alfred
>
> Index: phys_pager.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/vm/phys_pager.c,v
> retrieving revision 1.3.2.1
> diff -u -u -r1.3.2.1 phys_pager.c
> --- phys_pager.c    2000/08/04 22:31:11    1.3.2.1
> +++ phys_pager.c    2000/12/05 20:13:25
> @@ -83,7 +83,7 @@
>           * Allocate object and associate it with the pager.
>           */
>          object = vm_object_allocate(OBJT_PHYS,
> -            OFF_TO_IDX(foff + size));
> +            OFF_TO_IDX(foff + PAGE_MASK + size));
>          object->handle = handle;
>          TAILQ_INSERT_TAIL(&phys_pager_object_list, object,
>              pager_object_list);
>
>

Randy Jonasz
Software Engineer
Click2net Inc.
Web:  http://www.click2net.com
Phone: (905) 271-3550

"You cannot possibly pay a philosopher what he's worth,
but try your best" -- Aristotle



Re: Need help with phys backed shm segments (Postgresql+FreeBSD).

From
Alfred Perlstein
Date:
* Oleg Bartunov <oleg@sai.msu.su> [001205 13:33] wrote:
> Alfred,
> 
> do you have any numbers with and without your patch ?
> I mean performance. You may use pg_check utility.

Er, I just made the patch a couple of hours ago, and I'm also
dealing with some other FreeBSD issues right now.  I will report
on it as soon as I can.

Theoretically You'll only see performance gains when doing fork(),
the real intent here is to allow for giant segments, without
kern.ipc.shm_use_phys=1 running let's say 768meg (out of 1gig)
shared memory segments will probably cause performance problems
because of the amount of swap structures needed per-process to
manage swappable segments.

I'm going to be enabling this on one of our boxes and see if it
makes a noticeable difference.  I'll let you guys know.

> > Date: Tue, 5 Dec 2000 13:04:45 -0800
> > From: Alfred Perlstein <bright@wintelcom.net>
> > To: Tom Lane <tgl@sss.pgh.pa.us>
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Need help with phys backed shm segments (Postgresql+FreeBSD).
> > 
> > Here's the patch I'm using on FreeBSD, it seems to work, if any
> > other FreeBSD'ers want to try it out, just apply the patch:
> > cd /usr/src/sys/vm ; patch < patchfile
> > 
> > and recompile and boot with a new kernel, then do this:
> > 
> > sysctl -w kern.ipc.shm_use_phys=1
> > 
> > or add:
> > kern.ipc.shm_use_phys=1 
> > to /etc/sysctl.conf
> > 
> > Let me know if it works.
> > 
> > thanks,
> > -Alfred