Thread: Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

Kris Kennaway <kris@obsecurity.org> forwards:
> Yes but there are still a lot of wakeups to be avoided in the current
> System V semaphore code.  More specifically, not only do we wakeup all
> the processes waiting on a single semaphore everytime something changes,
> but we also wakeup all processes waiting on *any* of the semaphore in
> the semaphore *set*, whatever the reason we're sleeping.

Ohhhh ... *that's* the problem.  Ugh.  Although we have a separate
semaphore for each PG backend, they're grouped into semaphore sets
(I think 16 active semaphores per set).  So a wakeup intended for one
process would uselessly send up to 15 others through the semop code.

The only thing we could do to fix that from our end would be to use
a smaller sema-set size on *BSD platforms.  Is the overhead per sema set
small enough to make this a sane thing to do?  Will we be likely to
run into system limits on the number of sets?
        regards, tom lane


Maxime Henrion <mux@freebsd.org> writes:
> Thanks for forwarding my mail, Kris!  To Tom: if you can get my mails
> to reach pgsql-hackers@ somehow that would be just great :-).

They'll get approved eventually, just like mine to the BSD lists will
get approved eventually ;-)

>> The only thing we could do to fix that from our end would be to use
>> a smaller sema-set size on *BSD platforms.  Is the overhead per sema set
>> small enough to make this a sane thing to do?  Will we be likely to
>> run into system limits on the number of sets?

> I'm not familiar enough with the PostgreSQL code to know what impact
> such a change could have, but since the problem is clearly on our
> side here, I would advise against doing changes in PostgreSQL that
> are likely to complicate the code for little gain.  We still didn't
> even fully measure how much the useless wakups cost us since we're
> running into other contention problems with my patch that removes
> those.  And, as you point out, there are complications ensuing with
> respect to system limits (we already ask users to bump them when
> they install PostgreSQL).

OK, it was just an off-the-cuff idea.

> I think the high number of setproctitle() calls are more problematic
> to us at the moment, Kris can comment on that.

As of PG 8.2 it is possible to turn those off.  I don't think there's a
lot of enthusiasm for turning them off by default ... at least not yet.
But it might make sense to point out in the PG documentation that
update_process_title is particularly costly on platforms X, Y, and Z.
Do you know if this issue affects all the BSDen equally?
        regards, tom lane


Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Mark Kirkwood
Date:
Tom Lane wrote:

> 
>> I think the high number of setproctitle() calls are more problematic
>> to us at the moment, Kris can comment on that.
> 
> As of PG 8.2 it is possible to turn those off.  I don't think there's a
> lot of enthusiasm for turning them off by default ... at least not yet.
> But it might make sense to point out in the PG documentation that
> update_process_title is particularly costly on platforms X, Y, and Z.
> Do you know if this issue affects all the BSDen equally?
> 


Might be good to turn off by default for the 8.2+ Postgresql versions in 
the FreeBSD ports tree (looks like postgresql.conf.sample is being 
patched anyway, so pretty easy to amend).

Cheers

Mark



Kris Kennaway <kris@obsecurity.org> writes:
>>>>> I think the high number of setproctitle() calls are more problematic
>>>>> to us at the moment, Kris can comment on that.

> Since we've basically had it handed to us that calling setproctitle()
> thousands of times per second is something that real applications now
> do, we're pretty much forced to work on making it cheaper.
> ...
> However this won't help all the existing systems out there (including
> other affected OSes), so it would be great if you guys could meet us
> half way and find a way to make postgresql rate-limit these calls by
> default to some suitable compromise rate, like once/second or
> whatever.

Well, the thing is, we've pretty much had it handed to us that
current-command indicators that aren't up to date are not very useful.
So rate-limited updates strike me as a useless compromise.  We have
the "real" solution (status advertised in PG's shared memory) already,
so the question in my mind is just how fast DBAs will wish to transition
to looking at "select * from pg_stat_activity" instead of looking at
"ps auxww".

I don't see anything wrong at all with making update_process_title
default to "off" in BSD-specific packaging of Postgres.  It's a harder
sell to turn it off by default everywhere, because of all them Linux
users for whom that's just taking away a convenient status viewing
method.  I think we might get there eventually, but we need a decent
interval to wean people away from the old method.

[ Disclaimer: I work for Red Hat, so am unlikely to favor doing anything
that is a loss on Linux.  But I do use and like other platforms too;
just don't happen to have any BSD in-house currently, unless you're
willing to count Darwin as BSD. ]
        regards, tom lane


Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Kris Kennaway
Date:
On Tue, Apr 10, 2007 at 08:23:36PM -0400, Tom Lane wrote:

> > I think the high number of setproctitle() calls are more problematic
> > to us at the moment, Kris can comment on that.
>
> As of PG 8.2 it is possible to turn those off.  I don't think there's a
> lot of enthusiasm for turning them off by default ... at least not yet.
> But it might make sense to point out in the PG documentation that
> update_process_title is particularly costly on platforms X, Y, and Z.
> Do you know if this issue affects all the BSDen equally?

It will likely affect them to some extent.  In fact the only platforms
it will not hurt on are those which have already jumped through
special hoops to make setproctitle() super-cheap.  I presume Linux is
in this category but don't know which others are, if any.

Since we've basically had it handed to us that calling setproctitle()
thousands of times per second is something that real applications now
do, we're pretty much forced to work on making it cheaper.  Hopefully
this is something that will be addressed over the next few months
(we're going to look at adding support for pages shared between libc
and kernel so this kind of thing can be done without requiring a
syscall).

However this won't help all the existing systems out there (including
other affected OSes), so it would be great if you guys could meet us
half way and find a way to make postgresql rate-limit these calls by
default to some suitable compromise rate, like once/second or
whatever.

Kris

Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Kris Kennaway
Date:
On Wed, Apr 11, 2007 at 12:50:06PM +1200, Mark Kirkwood wrote:
> Tom Lane wrote:
>
> >
> >>I think the high number of setproctitle() calls are more problematic
> >>to us at the moment, Kris can comment on that.
> >
> >As of PG 8.2 it is possible to turn those off.  I don't think there's a
> >lot of enthusiasm for turning them off by default ... at least not yet.
> >But it might make sense to point out in the PG documentation that
> >update_process_title is particularly costly on platforms X, Y, and Z.
> >Do you know if this issue affects all the BSDen equally?
> >
>
>
> Might be good to turn off by default for the 8.2+ Postgresql versions in
> the FreeBSD ports tree (looks like postgresql.conf.sample is being
> patched anyway, so pretty easy to amend).

Yeah, we might end up doing this, but I consider it a workaround.

Kris

Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Kris Kennaway
Date:
On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> writes:
> >>>>> I think the high number of setproctitle() calls are more problematic
> >>>>> to us at the moment, Kris can comment on that.
>
> > Since we've basically had it handed to us that calling setproctitle()
> > thousands of times per second is something that real applications now
> > do, we're pretty much forced to work on making it cheaper.
> > ...
> > However this won't help all the existing systems out there (including
> > other affected OSes), so it would be great if you guys could meet us
> > half way and find a way to make postgresql rate-limit these calls by
> > default to some suitable compromise rate, like once/second or
> > whatever.
>
> Well, the thing is, we've pretty much had it handed to us that
> current-command indicators that aren't up to date are not very useful.
> So rate-limited updates strike me as a useless compromise.  We have
> the "real" solution (status advertised in PG's shared memory) already,
> so the question in my mind is just how fast DBAs will wish to transition
> to looking at "select * from pg_stat_activity" instead of looking at
> "ps auxww".

I don't get your argument - ps auxww is never going to be 100%
up-to-date because during the time the command is running the status
may change.  So we already know that stats being a fraction of a
second out of date are acceptable to users, because that's what may
happen when you run ps in the present model.  So you can use this to
get away with limiting updates to e.g. 10/second and in practise no
users will notice the difference.

Updating thousands of times a second just on the off chance that an
admin may one day run ps is completely inefficient (and has a huge
overhead on non-Linux systems, so it's demonstrably not a sensible way
to do things), and to the extent that there is a problem to be solved
it isn't even really solving it anyway.

If there really are users who find 10 proctitle updates/second an
unacceptably low update rate, then tune for the default case and
provide an option to allow them to override the rate limit to whatever
update rate they find appropriate.

Kris

Kris Kennaway <kris@obsecurity.org> writes:
> On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
>> Well, the thing is, we've pretty much had it handed to us that
>> current-command indicators that aren't up to date are not very useful.
>> So rate-limited updates strike me as a useless compromise.

> I don't get your argument - ps auxww is never going to be 100%
> up-to-date because during the time the command is running the status
> may change.

Of course.  But we have already done the update-once-every-half-second
bit --- that was how pg_stat_activity used to work --- and our users
made clear that it's not good enough.  So I don't see us expending
significant effort to convert the setproctitle code path to that
approach.  The clear way of the future for expensive-setproctitle
platforms is just to turn it off entirely and rely on the new
pg_stat_activity implementation.
        regards, tom lane


Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Gregory Stark
Date:
"Kris Kennaway" <kris@obsecurity.org> writes:

> If there really are users who find 10 proctitle updates/second an
> unacceptably low update rate, then tune for the default case and
> provide an option to allow them to override the rate limit to whatever
> update rate they find appropriate.

If you rate limit the naive way you would end up with info that's arbitrarily
old and out of date. To get something that's guaranteed not to be older than
some maximum age we would have to start with setting timers and setting the
proctitle in a signal handler which would be much more complex than what I
think you're imagining.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Bruce Momjian
Date:
Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> writes:
> > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
> >> Well, the thing is, we've pretty much had it handed to us that
> >> current-command indicators that aren't up to date are not very useful.
> >> So rate-limited updates strike me as a useless compromise.
> 
> > I don't get your argument - ps auxww is never going to be 100%
> > up-to-date because during the time the command is running the status
> > may change.
> 
> Of course.  But we have already done the update-once-every-half-second
> bit --- that was how pg_stat_activity used to work --- and our users
> made clear that it's not good enough.  So I don't see us expending
> significant effort to convert the setproctitle code path to that
> approach.  The clear way of the future for expensive-setproctitle
> platforms is just to turn it off entirely and rely on the new
> pg_stat_activity implementation.

8.3 will modify less memory to update the process title than happened in
the past --- perhaps that will reduce the overhead, but I doubt it.  You
can test CVS HEAD to check it.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Maxime Henrion
Date:
Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> forwards:
> > Yes but there are still a lot of wakeups to be avoided in the current
> > System V semaphore code.  More specifically, not only do we wakeup all
> > the processes waiting on a single semaphore everytime something changes,
> > but we also wakeup all processes waiting on *any* of the semaphore in
> > the semaphore *set*, whatever the reason we're sleeping.

Thanks for forwarding my mail, Kris!  To Tom: if you can get my mails
to reach pgsql-hackers@ somehow that would be just great :-).

> Ohhhh ... *that's* the problem.  Ugh.  Although we have a separate
> semaphore for each PG backend, they're grouped into semaphore sets
> (I think 16 active semaphores per set).  So a wakeup intended for one
> process would uselessly send up to 15 others through the semop code.

Yes.

> The only thing we could do to fix that from our end would be to use
> a smaller sema-set size on *BSD platforms.  Is the overhead per sema set
> small enough to make this a sane thing to do?  Will we be likely to
> run into system limits on the number of sets?

I'm not familiar enough with the PostgreSQL code to know what impact
such a change could have, but since the problem is clearly on our
side here, I would advise against doing changes in PostgreSQL that
are likely to complicate the code for little gain.  We still didn't
even fully measure how much the useless wakups cost us since we're
running into other contention problems with my patch that removes
those.  And, as you point out, there are complications ensuing with
respect to system limits (we already ask users to bump them when
they install PostgreSQL).

I'm looking forward fixing/rewriting all of the FreeBSD sysV semaphore
code and am just waiting for a green light from my boss before doing
so.  Maybe someone will beat me to it, since it isn't such a big
change.

I think the high number of setproctitle() calls are more problematic
to us at the moment, Kris can comment on that.

Cheers,
Maxime


Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

From
Kris Kennaway
Date:
On Thu, Apr 12, 2007 at 12:57:32PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Kris Kennaway <kris@obsecurity.org> writes:
> > > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
> > >> Well, the thing is, we've pretty much had it handed to us that
> > >> current-command indicators that aren't up to date are not very useful.
> > >> So rate-limited updates strike me as a useless compromise.
> >
> > > I don't get your argument - ps auxww is never going to be 100%
> > > up-to-date because during the time the command is running the status
> > > may change.
> >
> > Of course.  But we have already done the update-once-every-half-second
> > bit --- that was how pg_stat_activity used to work --- and our users
> > made clear that it's not good enough.  So I don't see us expending
> > significant effort to convert the setproctitle code path to that
> > approach.  The clear way of the future for expensive-setproctitle
> > platforms is just to turn it off entirely and rely on the new
> > pg_stat_activity implementation.
>
> 8.3 will modify less memory to update the process title than happened in
> the past --- perhaps that will reduce the overhead, but I doubt it.  You
> can test CVS HEAD to check it.

Yeah, this is not relevant for BSD, it uses a syscall to set it (which
is why it has high overhead) instead of just modifying user memory.

Kris