Thread: Re: postgresql and process titles

Re: postgresql and process titles

From
"Marc G. Fournier"
Date:
The only way that I'm aware of for disabling this is at compile time ... 
after running configure, you want to modify:

src/include/pg_config.h

and undef HAVE_SETPROCTITLE ...

I'm CC'ng -hackers about this though, since I think you are the first to 
point to setproctitle() as being a serious performance bottleneck ...



On Sun, 11 Jun 2006, Kris Kennaway wrote:

> Why does postgresql change its process title so frequently and how can
> this be disabled?  Profiling suggests it's a fairly serious
> performance bottleneck.
>
> Kris
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: postgresql and process titles

From
Tom Lane
Date:
> On Sun, 11 Jun 2006, Kris Kennaway wrote:
>> Why does postgresql change its process title so frequently and how can
>> this be disabled?  Profiling suggests it's a fairly serious
>> performance bottleneck.

Let's see the evidence.
        regards, tom lane


Re: postgresql and process titles

From
Tom Lane
Date:
Kris Kennaway <kris@obsecurity.org> writes:
> On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
>> Let's see the evidence.

> The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> DB query) are causing contention on the Giant lock 25% of the time on
> a dual p4 + HTT.  Disabling process title setting completely gives an
> 8% peak performance boost to the super-smack select benchmark.

I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case?  And what did you do to "disable process title
setting completely"?

The reason I'm being doubting Thomas here is that I've never seen any
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.
        regards, tom lane


Re: postgresql and process titles

From
Kris Kennaway
Date:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> writes:
> > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> >> Let's see the evidence.
>
> > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > DB query) are causing contention on the Giant lock 25% of the time on
> > a dual p4 + HTT.  Disabling process title setting completely gives an
> > 8% peak performance boost to the super-smack select benchmark.
>
> I think you misunderstood me: I asked for evidence, not interpretation.
> What are you measuring, and with what tool, and what are the numbers?
> On what benchmark case?

As I said, I'm using the super-smack select benchmark; presumably you
are aware of it.

> And what did you do to "disable process title setting completely"?

Added this to ps_status.c:

#undef PS_USE_SETPROCTITLE
#undef PS_USE_PSTAT
#undef PS_USE_PS_STRINGS
#undef PS_USE_CHANGE_ARGV
#undef PS_USE_CLOBBER_ARGV
#define PS_USE_NONE

Here are the queries/second data, before and after:

x pg
+ pg-noproctitle
+------------------------------------------------------------+
|x  xx                                                  +    |
|x  xx                                                  +  + |
|xx xxx                                    +        + + ++++ |
| |AM|                                            |____AM___||
+------------------------------------------------------------+   N           Min           Max        Median
Avg       Stddev 
x  11       3399.29        3425.1       3413.93     3411.8418     9.4287675
+  10       3615.63       3699.32      3685.515      3679.344     24.894177
Difference at 95.0% confidence       267.502 +/- 16.871       7.8404% +/- 0.494483%       (Student's t, pooled s =
18.4484)

Kris

Re: postgresql and process titles

From
Kris Kennaway
Date:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > On Sun, 11 Jun 2006, Kris Kennaway wrote:
> >> Why does postgresql change its process title so frequently and how can
> >> this be disabled?  Profiling suggests it's a fairly serious
> >> performance bottleneck.
>
> Let's see the evidence.

The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT.  Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.

It's actually worse than that in stock FreeBSD because in certain
"mixed" workloads with other sources of Giant contention present they
will contend with each other to make overall machine performance much
worse; that's the "serious" aspect, although 8% performance loss just
in case someone should ever want to run 'ps' to check the state of the
database processes is nothing to laugh at either.

Kris

Re: postgresql and process titles

From
Kris Kennaway
Date:
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > Kris Kennaway <kris@obsecurity.org> writes:
> > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > >> Let's see the evidence.
> >
> > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > > DB query) are causing contention on the Giant lock 25% of the time on
> > > a dual p4 + HTT.  Disabling process title setting completely gives an
> > > 8% peak performance boost to the super-smack select benchmark.
> >
> > I think you misunderstood me: I asked for evidence, not interpretation.
> > What are you measuring, and with what tool, and what are the numbers?
> > On what benchmark case?  And what did you do to "disable process title
> > setting completely"?
> >
> > The reason I'm being doubting Thomas here is that I've never seen any
>
> Ba-da-bum!
>
> > indication on any other platform that ps_status is a major bottleneck.
> > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > interest, but let's see the proof in a form that someone else can
> > check and reproduce.
>
> It's also important to find out what version of FreeBSD this is. A lot
> of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> possible this isn't an issue in newer versions.

It's still true in 6.x and 7.x.  I have a patch that removes Giant
from the sysctl in question, and I have also removed it from another
relevant part of the kernel (semop() is bogusly acquiring Giant when
it is supposed to be mpsafe).

When it's possible to commit that patch (should be in time for 7.0,
but not sure if it will make it into 6.2) it will eliminate the worst
part of the problem, but it still leaves postgresql making thousands
of syscalls per second to flip its process titles back and forth,
which needs to be looked at carefully for a performance impact.  For
now, users of FreeBSD who want that 8% should turn it off though (or
maybe one of the alternative methods is usable).

BTW, another promising performance/scalability change on BSD systems
would be to convert pgsql to use kqueue instead of select, since mutex
profiling traces show a lot of contention on the select lock in
FreeBSD.

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.

Kris

Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
> On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > > Kris Kennaway <kris@obsecurity.org> writes:
> > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > > >> Let's see the evidence.
> > > 
> > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > > > DB query) are causing contention on the Giant lock 25% of the time on
> > > > a dual p4 + HTT.  Disabling process title setting completely gives an
> > > > 8% peak performance boost to the super-smack select benchmark.
> > > 
> > > I think you misunderstood me: I asked for evidence, not interpretation.
> > > What are you measuring, and with what tool, and what are the numbers?
> > > On what benchmark case?  And what did you do to "disable process title
> > > setting completely"?
> > > 
> > > The reason I'm being doubting Thomas here is that I've never seen any
> > 
> > Ba-da-bum!
> > 
> > > indication on any other platform that ps_status is a major bottleneck.
> > > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > > interest, but let's see the proof in a form that someone else can
> > > check and reproduce.
> > 
> > It's also important to find out what version of FreeBSD this is. A lot
> > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> > possible this isn't an issue in newer versions.
Can you provide the actual commands you used to setup and run the test?
This would allow others to duplicate your results and debug the
situation on their own. This is also important because we've generally
found HTT to be a loss (except on Windows), so it'd be good to see what
impact this has on AMD hardware. It would also be very useful to have
the raw test result numbers you obtained.

> It's still true in 6.x and 7.x.  I have a patch that removes Giant
> from the sysctl in question, and I have also removed it from another
> relevant part of the kernel (semop() is bogusly acquiring Giant when
> it is supposed to be mpsafe).
What affect did that patch have on the numbers? And where is it, in case
anyone here wants to try it?

> When it's possible to commit that patch (should be in time for 7.0,
> but not sure if it will make it into 6.2) it will eliminate the worst
> part of the problem, but it still leaves postgresql making thousands
> of syscalls per second to flip its process titles back and forth,
> which needs to be looked at carefully for a performance impact.  For
> now, users of FreeBSD who want that 8% should turn it off though (or
> maybe one of the alternative methods is usable).

We have a similar issue internally with stats_command_string. The issue
is that it's very helpful to be able to see what a 'long running' (more
than a second or so) statement is doing, but there's very little value
in doing all that work for a statement that's only going to run for a
few ms. If there's a very cheap way to set some kind of a timer that
would update this information once a statement's been around long enough
that might be a way to handle this (I don't know if we're already using
ALRM in the backend, or if that's a cheap enough solution).

> FYI, the biggest source of contention is via semop() - it might be
> possible to optimize that some more in FreeBSD, I don't know.

Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Kris Kennaway
Date:
On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
> On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
> > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > > > Kris Kennaway <kris@obsecurity.org> writes:
> > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > > > >> Let's see the evidence.
> > > >
> > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > > > > DB query) are causing contention on the Giant lock 25% of the time on
> > > > > a dual p4 + HTT.  Disabling process title setting completely gives an
> > > > > 8% peak performance boost to the super-smack select benchmark.
> > > >
> > > > I think you misunderstood me: I asked for evidence, not interpretation.
> > > > What are you measuring, and with what tool, and what are the numbers?
> > > > On what benchmark case?  And what did you do to "disable process title
> > > > setting completely"?
> > > >
> > > > The reason I'm being doubting Thomas here is that I've never seen any
> > >
> > > Ba-da-bum!
> > >
> > > > indication on any other platform that ps_status is a major bottleneck.
> > > > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > > > interest, but let's see the proof in a form that someone else can
> > > > check and reproduce.
> > >
> > > It's also important to find out what version of FreeBSD this is. A lot
> > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> > > possible this isn't an issue in newer versions.
>
> Can you provide the actual commands you used to setup and run the test?

I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).

It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps.  Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.

The only thing I had to change on FreeBSD was to edit the
select-key.smack and change "localhost" to "127.0.0.1" in two
locations to avoid possibly using IPv6 transport.

> This would allow others to duplicate your results and debug the
> situation on their own. This is also important because we've generally
> found HTT to be a loss (except on Windows), so it'd be good to see what
> impact this has on AMD hardware. It would also be very useful to have
> the raw test result numbers you obtained.

They were posted previously.  This is Intel hardware (AMD doesn't do
HTT), and I didn't retest without HTT.  I'll try to do so if I have
the time (however previously when profiling mysql, HTT did give a small
positive change).

> > It's still true in 6.x and 7.x.  I have a patch that removes Giant
> > from the sysctl in question, and I have also removed it from another
> > relevant part of the kernel (semop() is bogusly acquiring Giant when
> > it is supposed to be mpsafe).
>
> What affect did that patch have on the numbers? And where is it, in case
> anyone here wants to try it?

I didn't yet retest with the patch.  It's in my perforce branch:
 http://perforce.freebsd.org/changeList.cgi?FSPC=//depot/user/kris/contention/...

although you probably need a combination of the changes in order for
it to be usable.

> > When it's possible to commit that patch (should be in time for 7.0,
> > but not sure if it will make it into 6.2) it will eliminate the worst
> > part of the problem, but it still leaves postgresql making thousands
> > of syscalls per second to flip its process titles back and forth,
> > which needs to be looked at carefully for a performance impact.  For
> > now, users of FreeBSD who want that 8% should turn it off though (or
> > maybe one of the alternative methods is usable).
>
> We have a similar issue internally with stats_command_string. The issue
> is that it's very helpful to be able to see what a 'long running' (more
> than a second or so) statement is doing, but there's very little value
> in doing all that work for a statement that's only going to run for a
> few ms. If there's a very cheap way to set some kind of a timer that
> would update this information once a statement's been around long enough
> that might be a way to handle this (I don't know if we're already using
> ALRM in the backend, or if that's a cheap enough solution).

I don't know what the best way to implement it would be, but limiting
the frequency of these updates does seem to be the way to go.

> > FYI, the biggest source of contention is via semop() - it might be
> > possible to optimize that some more in FreeBSD, I don't know.
>
> Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
> of procs in either semwait or semlock. :(

Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs.  It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.

Kris

Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Mon, Jun 12, 2006 at 11:38:01AM -0400, Kris Kennaway wrote:
> On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
> > On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
> > > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
> > > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
> > > > > Kris Kennaway <kris@obsecurity.org> writes:
> > > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
> > > > > >> Let's see the evidence.
> > > > > 
> > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per
> > > > > > DB query) are causing contention on the Giant lock 25% of the time on
> > > > > > a dual p4 + HTT.  Disabling process title setting completely gives an
> > > > > > 8% peak performance boost to the super-smack select benchmark.
> > > > > 
> > > > > I think you misunderstood me: I asked for evidence, not interpretation.
> > > > > What are you measuring, and with what tool, and what are the numbers?
> > > > > On what benchmark case?  And what did you do to "disable process title
> > > > > setting completely"?
> > > > > 
> > > > > The reason I'm being doubting Thomas here is that I've never seen any
> > > > 
> > > > Ba-da-bum!
> > > > 
> > > > > indication on any other platform that ps_status is a major bottleneck.
> > > > > Now maybe FreeBSD really sucks, or maybe you're onto something of
> > > > > interest, but let's see the proof in a form that someone else can
> > > > > check and reproduce.
> > > > 
> > > > It's also important to find out what version of FreeBSD this is. A lot
> > > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
> > > > possible this isn't an issue in newer versions.
> >  
> > Can you provide the actual commands you used to setup and run the test?
> 
> I actually forget all the steps I needed to do to get super-smack
> working with postgresql since it required a lot of trial and error for
> a database newbie like me (compiling it from the
> benchmarks/super-smack port was trivial, but unlike mysql it required
> configuring the database by hand - this should hopefully be more
> obvious to someone familiar with pgsql though).
> 
> It would be great if someone on your end could make this easier, BTW -
> e.g. at least document the steps.  Also super-smack should be changed
> to allow use via a local socket with pgsql (this is the default with
> mysql) - this avoids measuring network stack overhead.
Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
> > > FYI, the biggest source of contention is via semop() - it might be
> > > possible to optimize that some more in FreeBSD, I don't know.
> > 
> > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
> > of procs in either semwait or semlock. :(
> 
> Part of that is Giant contention again; for example on 6.x semop() and
> setproctitle() both want to acquire it, so they'll fight with each
> other and with anything else on the system that wants Giant
> (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
> issue here going forward, but there is still as much lock contention
> just between semop() calls running on different CPUs.  It may be
> possible for someone to implement more fine-grained locking here, but
> I don't know if there is available interest.

FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> FWIW, getting turning off stats_command_string substantially reduced
> this contention, so it appears the issue is somewhere in the stats code.
> This code sends stats messages to a different process via a socket (or
> is it UDP?), with the intention that if the system gets heavily loaded
> we'll lose some stats in the interest of not bogging down all the
> backends. It seems that doesn't work so hot on FreeBSD. :(

I am working on a patch for 8.2 to fix that for all platforms.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > FWIW, getting turning off stats_command_string substantially reduced
> > this contention, so it appears the issue is somewhere in the stats code.
> > This code sends stats messages to a different process via a socket (or
> > is it UDP?), with the intention that if the system gets heavily loaded
> > we'll lose some stats in the interest of not bogging down all the
> > backends. It seems that doesn't work so hot on FreeBSD. :(
> 
> I am working on a patch for 8.2 to fix that for all platforms.

Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > FWIW, getting turning off stats_command_string substantially reduced
> > > this contention, so it appears the issue is somewhere in the stats code.
> > > This code sends stats messages to a different process via a socket (or
> > > is it UDP?), with the intention that if the system gets heavily loaded
> > > we'll lose some stats in the interest of not bogging down all the
> > > backends. It seems that doesn't work so hot on FreeBSD. :(
> > 
> > I am working on a patch for 8.2 to fix that for all platforms.
> 
> Excellent. Did I miss discussion of that or have you not mentioned it?
> I'm curious as to how you're fixing it...

The patches are in the hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Title is "Stats collector performance improvement".  I need someone to
test my patches on a non-BSD platform to move forward.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: postgresql and process titles

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Jim C. Nasby wrote:
>> Excellent. Did I miss discussion of that or have you not mentioned it?
>> I'm curious as to how you're fixing it...

> The patches are in the hold queue:
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold

That's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c).  But IIRC that patch is on hold because
nobody particularly liked the approach it's taking.  I think we should
investigate rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at all?  It'd
be interesting to see what happens if we just make the collector process
read the UDP socket directly.  Or alternatively drop the UDP socket in
favor of having the backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).

As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so).  The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work.  Not clear
that there's a win there.  (If you're using statement_timeout it might
not matter, but if you aren't...)

Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.
        regards, tom lane


Re: postgresql and process titles

From
Martijn van Oosterhout
Date:
On Tue, Jun 13, 2006 at 04:35:24PM -0400, Tom Lane wrote:
> ...  The main objection I see to this is that it replaces a kernel
> call that actually does some work with a kernel call to start a timer,
> plus possibly a later kernel call to actually do the work.  Not clear
> that there's a win there.
And ofcourse it's an almost guarenteed loss on systems that don't
require a syscall to set the proc title.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: postgresql and process titles

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Jim C. Nasby wrote:
> >> Excellent. Did I miss discussion of that or have you not mentioned it?
> >> I'm curious as to how you're fixing it...
> 
> > The patches are in the hold queue:
> >     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> 
> That's talking about the stats code, which has approximately zip to do
> with setproctitle (ps_status.c).  But IIRC that patch is on hold because

I thought the bug reporter was asking about the stats code was well.

> nobody particularly liked the approach it's taking.  I think we should
> investigate rewriting the stats communication architecture entirely ---
> in particular, do we really need the stats buffer process at all?  It'd
> be interesting to see what happens if we just make the collector process
> read the UDP socket directly.  Or alternatively drop the UDP socket in

Agreed, that's what I would prefer, and tested something like that, but
even pulling the packet into the buffer and throwing them away had
significant overhead, so I think the timeout trick has to be employed as
well as going to a single process.

> favor of having the backends write directly to the collector process'
> input pipe (not sure if this would port to Windows though).
> 
> As far as Kris' complaint goes, one thing that might be interesting is
> to delay both the setproctitle call and the stats command message send
> until the current command has been running a little while (say 100ms
> or so).  The main objection I see to this is that it replaces a kernel
> call that actually does some work with a kernel call to start a timer,
> plus possibly a later kernel call to actually do the work.  Not clear
> that there's a win there.  (If you're using statement_timeout it might
> not matter, but if you aren't...)
> 
> Also not clear how you make the necessary actions happen when the timer
> expires --- I seriously doubt it'd be safe to try to do either action
> directly in a signal handler.

Right.  What if the postmaster signals the backend once a second to do
their reporting.  I am not sure what the final solution will be, but we
_need_ one based on the performance numbers I and others have seen. 
Could we have PGPROC have a reporting boolean that is set every second
and somehow checked by each backend?

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Tue, Jun 13, 2006 at 05:05:31PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Jim C. Nasby wrote:
> > >> Excellent. Did I miss discussion of that or have you not mentioned it?
> > >> I'm curious as to how you're fixing it...
> > 
> > > The patches are in the hold queue:
> > >     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> > 
> > That's talking about the stats code, which has approximately zip to do
> > with setproctitle (ps_status.c).  But IIRC that patch is on hold because
> 
> I thought the bug reporter was asking about the stats code was well.
It did get brought up...

> > As far as Kris' complaint goes, one thing that might be interesting is
> > to delay both the setproctitle call and the stats command message send
> > until the current command has been running a little while (say 100ms
> > or so).  The main objection I see to this is that it replaces a kernel
> > call that actually does some work with a kernel call to start a timer,
> > plus possibly a later kernel call to actually do the work.  Not clear
> > that there's a win there.  (If you're using statement_timeout it might
> > not matter, but if you aren't...)
> > 
> > Also not clear how you make the necessary actions happen when the timer
> > expires --- I seriously doubt it'd be safe to try to do either action
> > directly in a signal handler.
> 
> Right.  What if the postmaster signals the backend once a second to do
> their reporting.  I am not sure what the final solution will be, but we
> _need_ one based on the performance numbers I and others have seen. 
> Could we have PGPROC have a reporting boolean that is set every second
> and somehow checked by each backend?

One second might be a bit more delay than some folks want... it would be
nice if this was tuneable. Also, what would the overhead on this look
like if there's a large number of idle backends?

It does sound more appealing than setting a timer every time you start a
transaction, though...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Jim Nasby
Date:
On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:
>>> FYI, the biggest source of contention is via semop() - it might be
>>> possible to optimize that some more in FreeBSD, I don't know.
>>
>> Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with  
>> a lot
>> of procs in either semwait or semlock. :(
>
> Part of that is Giant contention again; for example on 6.x semop() and
> setproctitle() both want to acquire it, so they'll fight with each
> other and with anything else on the system that wants Giant
> (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
> issue here going forward, but there is still as much lock contention
> just between semop() calls running on different CPUs.  It may be
> possible for someone to implement more fine-grained locking here, but
> I don't know if there is available interest.

BTW, there's another FBSD performance odditiy I've run across. Running

pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &

which dumps the email_contrib table to bzip2 then to disk, the OS  
won't use more than 1 CPU on an SMP system... unless the data is  
cached. According to both gstat and systat -v, the system isn't I/O  
bound; both are reporting the RAID10 with that table on it as only  
about 10% busy. If I let that command run for a bit then cancel it  
and re-start it so that the beginning of that table is in cache, it  
will use one entire CPU for bzip2, which is what I'd expect to happen.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461




Re: postgresql and process titles

From
"Marc G. Fournier"
Date:
On Tue, 13 Jun 2006, Jim Nasby wrote:

>
> On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:
>>>> FYI, the biggest source of contention is via semop() - it might be
>>>> possible to optimize that some more in FreeBSD, I don't know.
>>> 
>>> Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
>>> of procs in either semwait or semlock. :(
>> 
>> Part of that is Giant contention again; for example on 6.x semop() and
>> setproctitle() both want to acquire it, so they'll fight with each
>> other and with anything else on the system that wants Giant
>> (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
>> issue here going forward, but there is still as much lock contention
>> just between semop() calls running on different CPUs.  It may be
>> possible for someone to implement more fine-grained locking here, but
>> I don't know if there is available interest.
>
> BTW, there's another FBSD performance odditiy I've run across. Running
>
> pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
>
> which dumps the email_contrib table to bzip2 then to disk, the OS won't use 
> more than 1 CPU on an SMP system... unless the data is cached. According to 
> both gstat and systat -v, the system isn't I/O bound; both are reporting the 
> RAID10 with that table on it as only about 10% busy. If I let that command 
> run for a bit then cancel it and re-start it so that the beginning of that 
> table is in cache, it will use one entire CPU for bzip2, which is what I'd 
> expect to happen.

What version of FreeBSD are you dealing with here?  I'm guessing at least 
6.x, but just figured I'd clarify ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: postgresql and process titles

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Right.  What if the postmaster signals the backend once a second to do
> their reporting.  I am not sure what the final solution will be, but we
> _need_ one based on the performance numbers I and others have seen. 
> Could we have PGPROC have a reporting boolean that is set every second
> and somehow checked by each backend?

I don't see any point in involving the postmaster in it.  What might be
interesting is to replace the current backend timer-interrupt handling
by a free-running cyclic interrupt every N milliseconds (the resolution
of the statement_timeout and deadlock_check delays would then be no
better than N milliseconds, since those actions would occur at the next
cyclic interrupt after the desired time expires).  We could possibly
drive stats reports and ps_status updates from this, as well as sampling
EXPLAIN ANALYZE if anyone cares to pursue that.

Complaints I can foresee:

* lots of cycles wasted in idle backends.  Possibly a backend that's not
received any command for a second or two could shut down its interrupt
until it next gets a command.

* not clear whether the interrupt happens when waiting for I/O.  I
already mentioned that this would be a problem for EXPLAIN ANALYZE,
but it might be no big deal for the other uses.
        regards, tom lane


Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Tue, Jun 13, 2006 at 11:13:55PM -0300, Marc G. Fournier wrote:
> >BTW, there's another FBSD performance odditiy I've run across. Running
> >
> >pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
> >
> >which dumps the email_contrib table to bzip2 then to disk, the OS won't 
> >use more than 1 CPU on an SMP system... unless the data is cached. 
> >According to both gstat and systat -v, the system isn't I/O bound; both 
> >are reporting the RAID10 with that table on it as only about 10% busy. If 
> >I let that command run for a bit then cancel it and re-start it so that 
> >the beginning of that table is in cache, it will use one entire CPU for 
> >bzip2, which is what I'd expect to happen.
> 
> What version of FreeBSD are you dealing with here?  I'm guessing at least 
> 6.x, but just figured I'd clarify ...

FreeBSD 6.0-STABLE #6: Fri Dec  9 19:14:19 UTC 2005
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Martijn van Oosterhout
Date:
On Tue, Jun 13, 2006 at 10:36:27PM -0400, Tom Lane wrote:
> * not clear whether the interrupt happens when waiting for I/O.  I
> already mentioned that this would be a problem for EXPLAIN ANALYZE,
> but it might be no big deal for the other uses.

If we're going y UNIX signal semantics, in theory any interruptable
function (as listed on the signal manpage) would be interrupted every N
milliseconds to handle the interrupt. So yes, if possible the interrupt
will be serviced anyway, even during I/O, by aborting a restarting the
I/O continuously. Not terribly efficient.

With one big exception: sometimes I/O is non-interruptable (the good
old D state). In this case the interrupts will happen but will simply
be queued and in fact will all be dropped except the last one
(non-realtime signals are never stacked). The handler will probably be
called the moment it returns to user-space.

Note: Do we want SIGPROF, SIGALRM or SIGVTALRM? There's apparently a
distinction, see: http://en.wikipedia.org/wiki/SIGPROF

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: postgresql and process titles

From
Greg Stark
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> With one big exception: sometimes I/O is non-interruptable (the good
> old D state). In this case the interrupts will happen but will simply
> be queued and in fact will all be dropped except the last one
> (non-realtime signals are never stacked). The handler will probably be
> called the moment it returns to user-space.

If backends store their current status in shared memory then a separate
process entirely can receive the interrupts, scan through the shared memory
process states and do the accounting. It would never be interuppting i/o since
that process would never be doing i/o. It could use real-time signals, reads
on /dev/rtc, or whatever other methods exist for scheduling periodic events
since it would be doing nothing but handling these interrupts.

The neat thing about this is that it would be possible to look at a process
from outside instead of having to hijack the application to get feedback. You
could find information like total time spent in i/o versus cpu aggregated
across all queries from a single backend or for all backends.

The downside is that to handle things like EXPLAIN ANALYZE you would have to
provide enough information about plans to this accounting process for it to
store the information. Plans are currently purely local to the backend running
them.

Perhaps it would be enough to provide a unique id for the plan (sequential id
per backend would be enough). And a unique id for the plan node. The
accounting process wouldn't know anything more about what that node
represented but the backend could later query to fetch the information and
associate it with the plan.

-- 
greg



Re: postgresql and process titles

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> If backends store their current status in shared memory then a separate
> process entirely can receive the interrupts, scan through the shared memory
> process states and do the accounting.

This sounds good until you think about locking.  It'd be quite
impractical to implement anything as fine-grained as EXPLAIN ANALYZE
this way, because of the overhead involved in taking and releasing
spinlocks.

It could be practical as a replacement for stats_command_string
messages, though.

I'm not sure about replacing ps_status with this.  I don't think there
is a way for one process to set another's status (on most platforms
anyway).  You might argue that we could abandon ps_status reporting
altogether if we had something better, but I'm unconvinced ...
        regards, tom lane


Re: postgresql and process titles

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > If backends store their current status in shared memory then a separate
> > process entirely can receive the interrupts, scan through the shared memory
> > process states and do the accounting.
> 
> This sounds good until you think about locking.  It'd be quite
> impractical to implement anything as fine-grained as EXPLAIN ANALYZE
> this way, because of the overhead involved in taking and releasing
> spinlocks.

I'm not entirely convinced. The only other process that would be looking at
the information would be the statistics accumulator which would only be waking
up every 100ms or so. There would be no contention with other backends
reporting their info.

I was also thinking of reporting only basic information like which of a small
number of states the backend was in and which node of the plan is being
executed. If those are just integers then it might be possible to get away
without locking at all, storing them as sig_atomic_t. 

I think there would still be a fair amount of cache coherency overhead like we
see with the tas instruction on Xeons but only if that backend's monitor block
is actually being touched by the statistics accumulator process which would
presumably be only when running an EXPLAIN ANALYZE or other tracing facility.

-- 
greg



Re: postgresql and process titles

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> This sounds good until you think about locking.  It'd be quite
>> impractical to implement anything as fine-grained as EXPLAIN ANALYZE
>> this way, because of the overhead involved in taking and releasing
>> spinlocks.

> I'm not entirely convinced. The only other process that would be looking at
> the information would be the statistics accumulator which would only be waking
> up every 100ms or so. There would be no contention with other backends
> reporting their info.

The numbers I've been looking at lately say that heavy lock traffic is
expensive, particularly on SMP machines, even with zero contention.
Seems the cache coherency protocol costs a lot even when it's not doing
anything...
        regards, tom lane


Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Wed, Jun 14, 2006 at 03:51:28PM -0400, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> This sounds good until you think about locking.  It'd be quite
> >> impractical to implement anything as fine-grained as EXPLAIN ANALYZE
> >> this way, because of the overhead involved in taking and releasing
> >> spinlocks.
> 
> > I'm not entirely convinced. The only other process that would be looking at
> > the information would be the statistics accumulator which would only be waking
> > up every 100ms or so. There would be no contention with other backends
> > reporting their info.
> 
> The numbers I've been looking at lately say that heavy lock traffic is
> expensive, particularly on SMP machines, even with zero contention.
> Seems the cache coherency protocol costs a lot even when it's not doing
> anything...

Are there any ways we could avoid the locking?

One idea would be to keep something akin to a FIFO, where the backend
would write records instead of updating/over-writing them, and the
reader process would only read records where there was no risk that they
were still being written. That would mean that the reader would need to
stay at least one record behind the backend, but that's probably
manageable.

The downside is more memory usage, but I think we could limit that by
also keeping track of what record the reader had last read. The backend
would check that, and if the reader was more than X records behind, the
backend would update the most recent record it had written, instead of
writing a new one. That would place an effective limit on how much
memory was consumed.

But... I have no idea how exactly shared memory works, so maybe this
plan is fundamentally broken. But hopefully there's some way to handle
the locking problems, because a seperate reader process does sound like
an interesting possibility.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Martijn van Oosterhout
Date:
On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote:
> One idea would be to keep something akin to a FIFO, where the backend
> would write records instead of updating/over-writing them, and the
> reader process would only read records where there was no risk that they
> were still being written. That would mean that the reader would need to
> stay at least one record behind the backend, but that's probably
> manageable.

The problem with a FIFO is that the pointers controlling where the
start/end are become the contention.

The only thing I can think of is to have the backends only write atomic
types and give each backend their own cache-line. The profiler would
simply wake up every N ms and read each value, accumulating them
somewhere.

No locking and there would be a maximum of one cache-line bounce per
backend per N ms. I'm not sure you can get better than that under the
situation.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: postgresql and process titles

From
Greg Stark
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:

> > The numbers I've been looking at lately say that heavy lock traffic is
> > expensive, particularly on SMP machines, even with zero contention.
> > Seems the cache coherency protocol costs a lot even when it's not doing
> > anything...
> 
> Are there any ways we could avoid the locking?

Well if all we want to do is reproduce the current functionality of EXPLAIN
ANALYZE, all you need is a single sig_atomic_t int that you store the address
of the current node in. When you're done with the query you ask the central
statistics accumulator for how the counts of how many times it saw each node
address.

But that's sort of skipping some steps. The reason tas is slow is because it
needs to ensure cache coherency. Even just storing an int into shared memory
when its cache line is owned by the processor running the stats accumulator
would will have the same overhead.

That said I don't think it would actually be as much. TAS has to ensure no
other processor overwrites the data from this processor. Whereas merely
storing into a sig_atomic_t just has to ensure that you don't get a partially
written integer.

And it seems sort of silly to be doing this much work just to reproduce the
current functionality even if it's lower overhead. The big advantage of a
separate process would be that it could do much more. But that would require
pushing some representation of the plan out to the stats collector, storing
more information about the state than just what node you're at currently such
as whether you're in the midst of index lookups, sequential scans, operator
and function execution, etc.

-- 
greg



Re: postgresql and process titles

From
"Jim C. Nasby"
Date:
On Wed, Jun 14, 2006 at 10:37:59PM +0200, Martijn van Oosterhout wrote:
> On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote:
> > One idea would be to keep something akin to a FIFO, where the backend
> > would write records instead of updating/over-writing them, and the
> > reader process would only read records where there was no risk that they
> > were still being written. That would mean that the reader would need to
> > stay at least one record behind the backend, but that's probably
> > manageable.
> 
> The problem with a FIFO is that the pointers controlling where the
> start/end are become the contention.
Even if what's being written is a single byte/word and we're taking
intentional steps to make sure that reading a pointer that's off by 1
isn't an issue? Is the issue that if CPU A writes a pointer and CPU B
immediately tries to read it that it could get old data from it's cache?

> The only thing I can think of is to have the backends only write atomic
> types and give each backend their own cache-line. The profiler would
> simply wake up every N ms and read each value, accumulating them
> somewhere.
Would everything we need fit in a cache line?

> No locking and there would be a maximum of one cache-line bounce per
> backend per N ms. I'm not sure you can get better than that under the
> situation.

What scheduler timer resolution do modern OSes use? It's likely that on
a system with more runnable processes than processors, each process will
only run for N ms anyway...

Speaking of which... doesn't the variables involved in scheduling mean
we couldn't use sampling to calculate cost in EXPLAIN ANALYZE? Or is the
assumption just that it will affect all nodes evenly?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: postgresql and process titles

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Well if all we want to do is reproduce the current functionality of EXPLAIN
> ANALYZE, all you need is a single sig_atomic_t int that you store the address
> of the current node in.

Do I need to point out that sig_atomic_t can't portably be assumed to be
wider than char?

We do currently assume that TransactionId can be read/written
atomically, but (a) that's an int not a pointer, and (b) the assumption
is cruft that we really ought to get rid of.

In any case, speculating about whether we can do something useful with
atomic types ignores the main real problem the thread is about.  Anybody
remember process titles and current command strings?
        regards, tom lane


Re: postgresql and process titles

From
Greg Stark
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:

> Even if what's being written is a single byte/word and we're taking
> intentional steps to make sure that reading a pointer that's off by 1
> isn't an issue? Is the issue that if CPU A writes a pointer and CPU B
> immediately tries to read it that it could get old data from it's cache?

Well for applications like this you don't care whether you get the old data or
the new data. You just want to be sure you don't get half of each.

However:

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Do I need to point out that sig_atomic_t can't portably be assumed to be
> wider than char?

I didn't actually realize that. That would mean EXPLAIN ANALYZE would be
limited to counting times for 256 plan nodes and wouldn't really be much of an
improvement over the existing infrastructure.

> In any case, speculating about whether we can do something useful with
> atomic types ignores the main real problem the thread is about.  Anybody
> remember process titles and current command strings?

Oops, I thought this was the ANALYZE patch thread.

-- 
greg



Re: postgresql and process titles

From
Kris Kennaway
Date:
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:

> > > Can you provide the actual commands you used to setup and run the test?
> >
> > I actually forget all the steps I needed to do to get super-smack
> > working with postgresql since it required a lot of trial and error for
> > a database newbie like me (compiling it from the
> > benchmarks/super-smack port was trivial, but unlike mysql it required
> > configuring the database by hand - this should hopefully be more
> > obvious to someone familiar with pgsql though).
> >
> > It would be great if someone on your end could make this easier, BTW -
> > e.g. at least document the steps.  Also super-smack should be changed
> > to allow use via a local socket with pgsql (this is the default with
> > mysql) - this avoids measuring network stack overhead.
>
> Unless supersmack has improved substantially, you're unlikely to find
> much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
> (http://sourceforge.net/projects/osdldbt) is much more realistic (based
> on TPC-C and TPC-H).

Thanks for the reference, I'll check it out.  Brain-dead or not though
(and I've also seen evidence that it is, e.g. it does a hell of a lot
of 1-byte reads), it's a well known benchmark that seems to be fairly
widely used, so there's still some value in making it perform well
(but not at the expense of other things).  However my interest is not
so much for measuring database performance as for measuring kernel
performance and trying to optimize bottlenecks.

> > > > FYI, the biggest source of contention is via semop() - it might be
> > > > possible to optimize that some more in FreeBSD, I don't know.
> > >
> > > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
> > > of procs in either semwait or semlock. :(
> >
> > Part of that is Giant contention again; for example on 6.x semop() and
> > setproctitle() both want to acquire it, so they'll fight with each
> > other and with anything else on the system that wants Giant
> > (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
> > issue here going forward, but there is still as much lock contention
> > just between semop() calls running on different CPUs.  It may be
> > possible for someone to implement more fine-grained locking here, but
> > I don't know if there is available interest.
>
> FWIW, getting turning off stats_command_string substantially reduced
> this contention, so it appears the issue is somewhere in the stats code.
> This code sends stats messages to a different process via a socket (or
> is it UDP?), with the intention that if the system gets heavily loaded
> we'll lose some stats in the interest of not bogging down all the
> backends. It seems that doesn't work so hot on FreeBSD. :(

It could be it's making assumptions about what is cheap and what is
not, which are not true universally.  e.g. you might be adding a new
contention point that is even worse.  I didn't notice a change in
behaviour as I varied the load, so perhaps it was not visible on this
benchmark or I wasn't looking in the right place.

Kris

Re: postgresql and process titles

From
Kris Kennaway
Date:
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:

> Unless supersmack has improved substantially, you're unlikely to find
> much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
> (http://sourceforge.net/projects/osdldbt) is much more realistic (based
> on TPC-C and TPC-H).

Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
least) will need a moderate amount of hacking - there are some Linux
assumptions in the source and the configure script makes assumptions
about where things are installed that cannot be overridden on the
commandline.

Kris

Re: postgresql and process titles

From
Kris Kennaway
Date:
On Tue, Jun 13, 2006 at 03:55:38PM -0500, Jim Nasby wrote:
>
> On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:
> >>>FYI, the biggest source of contention is via semop() - it might be
> >>>possible to optimize that some more in FreeBSD, I don't know.
> >>
> >>Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with
> >>a lot
> >>of procs in either semwait or semlock. :(
> >
> >Part of that is Giant contention again; for example on 6.x semop() and
> >setproctitle() both want to acquire it, so they'll fight with each
> >other and with anything else on the system that wants Giant
> >(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
> >issue here going forward, but there is still as much lock contention
> >just between semop() calls running on different CPUs.  It may be
> >possible for someone to implement more fine-grained locking here, but
> >I don't know if there is available interest.
>
> BTW, there's another FBSD performance odditiy I've run across. Running
>
> pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
>
> which dumps the email_contrib table to bzip2 then to disk, the OS
> won't use more than 1 CPU on an SMP system... unless the data is
> cached. According to both gstat and systat -v, the system isn't I/O
> bound; both are reporting the RAID10 with that table on it as only
> about 10% busy. If I let that command run for a bit then cancel it
> and re-start it so that the beginning of that table is in cache, it
> will use one entire CPU for bzip2, which is what I'd expect to happen.

Hmm, odd..maybe something with the scheduler.  I'd need access to a
test case to be able to figure it out though.

Kris

Re: postgresql and process titles

From
Rod Taylor
Date:
On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote:
> On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
> 
> > Unless supersmack has improved substantially, you're unlikely to find
> > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
> > (http://sourceforge.net/projects/osdldbt) is much more realistic (based
> > on TPC-C and TPC-H).
> 
> Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
> least) will need a moderate amount of hacking - there are some Linux
> assumptions in the source and the configure script makes assumptions
> about where things are installed that cannot be overridden on the
> commandline.

I did have dbt2 pretty close to functional on FreeBSD a year ago but
it's probably gone back into linuxisms since then.
-- 



Re: postgresql and process titles

From
Rod Taylor
Date:
> > I did have dbt2 pretty close to functional on FreeBSD a year ago but
> > it's probably gone back into linuxisms since then.
> 
> :(
> 
> I won't have the chance to work on this further for another 2 months,
> but if you have patches I could see about picking up on them when I
> get back.

Everything has been applied to the dbt2 tree.

-- 



Re: postgresql and process titles

From
Kris Kennaway
Date:
On Thu, Jun 15, 2006 at 11:34:52PM -0400, Rod Taylor wrote:
> On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote:
> > On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
> >
> > > Unless supersmack has improved substantially, you're unlikely to find
> > > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
> > > (http://sourceforge.net/projects/osdldbt) is much more realistic (based
> > > on TPC-C and TPC-H).
> >
> > Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
> > least) will need a moderate amount of hacking - there are some Linux
> > assumptions in the source and the configure script makes assumptions
> > about where things are installed that cannot be overridden on the
> > commandline.
>
> I did have dbt2 pretty close to functional on FreeBSD a year ago but
> it's probably gone back into linuxisms since then.

:(

I won't have the chance to work on this further for another 2 months,
but if you have patches I could see about picking up on them when I
get back.

Kris


Re: postgresql and process titles

From
Kris Kennaway
Date:
On Fri, Jun 16, 2006 at 07:56:30AM -0400, Rod Taylor wrote:
> > > I did have dbt2 pretty close to functional on FreeBSD a year ago but
> > > it's probably gone back into linuxisms since then.
> >
> > :(
> >
> > I won't have the chance to work on this further for another 2 months,
> > but if you have patches I could see about picking up on them when I
> > get back.
>
> Everything has been applied to the dbt2 tree.

Cool!

Kris

Re: postgresql and process titles

From
Jim Nasby
Date:
Moving to osdldbt-general and dropping Tom and Marc.

On Jun 13, 2006, at 1:18 PM, Kris Kennaway wrote:
> On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
>
>> Unless supersmack has improved substantially, you're unlikely to find
>> much interest. Last I heard it was a pretty brain-dead benchmark.  
>> DBT2/3
>> (http://sourceforge.net/projects/osdldbt) is much more realistic  
>> (based
>> on TPC-C and TPC-H).
>
> Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
> least) will need a moderate amount of hacking - there are some Linux
> assumptions in the source and the configure script makes assumptions
> about where things are installed that cannot be overridden on the
> commandline.

Yeah, there's a number of issues that would need to be addressed. The  
biggest problem as I see it is that all the framework to run tests is  
pretty tied together, without any modularity. The run script itself  
launches all the stats collecting stuff, assumes it needs to happen  
on the local machine, etc.

There is consensus that this kind of thing needs to be done to  
improve dbt*, it's just a matter of doing it. Also, it's desirable to  
improve the benchmarks so that they can store results in a database,  
and set them up so that the monitoring stuff is common to all the  
different tests.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461




Re: postgresql and process titles

From
Jim Nasby
Date:
On Jun 13, 2006, at 9:42 PM, Kris Kennaway wrote:
>> BTW, there's another FBSD performance odditiy I've run across.  
>> Running
>>
>> pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
>>
>> which dumps the email_contrib table to bzip2 then to disk, the OS
>> won't use more than 1 CPU on an SMP system... unless the data is
>> cached. According to both gstat and systat -v, the system isn't I/O
>> bound; both are reporting the RAID10 with that table on it as only
>> about 10% busy. If I let that command run for a bit then cancel it
>> and re-start it so that the beginning of that table is in cache, it
>> will use one entire CPU for bzip2, which is what I'd expect to  
>> happen.
>
> Hmm, odd..maybe something with the scheduler.  I'd need access to a
> test case to be able to figure it out though.

well, pg_dump of any sizeable database piped through bzip2 or gzip  
should show this. Try:

pg_dump large_database | bzip2 > databasedump.sql.bz2
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461