Thread: Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
Kris Kennaway <kris@obsecurity.org> forwards: > Yes but there are still a lot of wakeups to be avoided in the current > System V semaphore code. More specifically, not only do we wakeup all > the processes waiting on a single semaphore everytime something changes, > but we also wakeup all processes waiting on *any* of the semaphore in > the semaphore *set*, whatever the reason we're sleeping. Ohhhh ... *that's* the problem. Ugh. Although we have a separate semaphore for each PG backend, they're grouped into semaphore sets (I think 16 active semaphores per set). So a wakeup intended for one process would uselessly send up to 15 others through the semop code. The only thing we could do to fix that from our end would be to use a smaller sema-set size on *BSD platforms. Is the overhead per sema set small enough to make this a sane thing to do? Will we be likely to run into system limits on the number of sets? regards, tom lane
Maxime Henrion <mux@freebsd.org> writes: > Thanks for forwarding my mail, Kris! To Tom: if you can get my mails > to reach pgsql-hackers@ somehow that would be just great :-). They'll get approved eventually, just like mine to the BSD lists will get approved eventually ;-) >> The only thing we could do to fix that from our end would be to use >> a smaller sema-set size on *BSD platforms. Is the overhead per sema set >> small enough to make this a sane thing to do? Will we be likely to >> run into system limits on the number of sets? > I'm not familiar enough with the PostgreSQL code to know what impact > such a change could have, but since the problem is clearly on our > side here, I would advise against doing changes in PostgreSQL that > are likely to complicate the code for little gain. We still didn't > even fully measure how much the useless wakups cost us since we're > running into other contention problems with my patch that removes > those. And, as you point out, there are complications ensuing with > respect to system limits (we already ask users to bump them when > they install PostgreSQL). OK, it was just an off-the-cuff idea. > I think the high number of setproctitle() calls are more problematic > to us at the moment, Kris can comment on that. As of PG 8.2 it is possible to turn those off. I don't think there's a lot of enthusiasm for turning them off by default ... at least not yet. But it might make sense to point out in the PG documentation that update_process_title is particularly costly on platforms X, Y, and Z. Do you know if this issue affects all the BSDen equally? regards, tom lane
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Mark Kirkwood
Date:
Tom Lane wrote: > >> I think the high number of setproctitle() calls are more problematic >> to us at the moment, Kris can comment on that. > > As of PG 8.2 it is possible to turn those off. I don't think there's a > lot of enthusiasm for turning them off by default ... at least not yet. > But it might make sense to point out in the PG documentation that > update_process_title is particularly costly on platforms X, Y, and Z. > Do you know if this issue affects all the BSDen equally? > Might be good to turn off by default for the 8.2+ Postgresql versions in the FreeBSD ports tree (looks like postgresql.conf.sample is being patched anyway, so pretty easy to amend). Cheers Mark
Kris Kennaway <kris@obsecurity.org> writes: >>>>> I think the high number of setproctitle() calls are more problematic >>>>> to us at the moment, Kris can comment on that. > Since we've basically had it handed to us that calling setproctitle() > thousands of times per second is something that real applications now > do, we're pretty much forced to work on making it cheaper. > ... > However this won't help all the existing systems out there (including > other affected OSes), so it would be great if you guys could meet us > half way and find a way to make postgresql rate-limit these calls by > default to some suitable compromise rate, like once/second or > whatever. Well, the thing is, we've pretty much had it handed to us that current-command indicators that aren't up to date are not very useful. So rate-limited updates strike me as a useless compromise. We have the "real" solution (status advertised in PG's shared memory) already, so the question in my mind is just how fast DBAs will wish to transition to looking at "select * from pg_stat_activity" instead of looking at "ps auxww". I don't see anything wrong at all with making update_process_title default to "off" in BSD-specific packaging of Postgres. It's a harder sell to turn it off by default everywhere, because of all them Linux users for whom that's just taking away a convenient status viewing method. I think we might get there eventually, but we need a decent interval to wean people away from the old method. [ Disclaimer: I work for Red Hat, so am unlikely to favor doing anything that is a loss on Linux. But I do use and like other platforms too; just don't happen to have any BSD in-house currently, unless you're willing to count Darwin as BSD. ] regards, tom lane
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Kris Kennaway
Date:
On Tue, Apr 10, 2007 at 08:23:36PM -0400, Tom Lane wrote: > > I think the high number of setproctitle() calls are more problematic > > to us at the moment, Kris can comment on that. > > As of PG 8.2 it is possible to turn those off. I don't think there's a > lot of enthusiasm for turning them off by default ... at least not yet. > But it might make sense to point out in the PG documentation that > update_process_title is particularly costly on platforms X, Y, and Z. > Do you know if this issue affects all the BSDen equally? It will likely affect them to some extent. In fact the only platforms it will not hurt on are those which have already jumped through special hoops to make setproctitle() super-cheap. I presume Linux is in this category but don't know which others are, if any. Since we've basically had it handed to us that calling setproctitle() thousands of times per second is something that real applications now do, we're pretty much forced to work on making it cheaper. Hopefully this is something that will be addressed over the next few months (we're going to look at adding support for pages shared between libc and kernel so this kind of thing can be done without requiring a syscall). However this won't help all the existing systems out there (including other affected OSes), so it would be great if you guys could meet us half way and find a way to make postgresql rate-limit these calls by default to some suitable compromise rate, like once/second or whatever. Kris
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Kris Kennaway
Date:
On Wed, Apr 11, 2007 at 12:50:06PM +1200, Mark Kirkwood wrote: > Tom Lane wrote: > > > > >>I think the high number of setproctitle() calls are more problematic > >>to us at the moment, Kris can comment on that. > > > >As of PG 8.2 it is possible to turn those off. I don't think there's a > >lot of enthusiasm for turning them off by default ... at least not yet. > >But it might make sense to point out in the PG documentation that > >update_process_title is particularly costly on platforms X, Y, and Z. > >Do you know if this issue affects all the BSDen equally? > > > > > Might be good to turn off by default for the 8.2+ Postgresql versions in > the FreeBSD ports tree (looks like postgresql.conf.sample is being > patched anyway, so pretty easy to amend). Yeah, we might end up doing this, but I consider it a workaround. Kris
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Kris Kennaway
Date:
On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> writes: > >>>>> I think the high number of setproctitle() calls are more problematic > >>>>> to us at the moment, Kris can comment on that. > > > Since we've basically had it handed to us that calling setproctitle() > > thousands of times per second is something that real applications now > > do, we're pretty much forced to work on making it cheaper. > > ... > > However this won't help all the existing systems out there (including > > other affected OSes), so it would be great if you guys could meet us > > half way and find a way to make postgresql rate-limit these calls by > > default to some suitable compromise rate, like once/second or > > whatever. > > Well, the thing is, we've pretty much had it handed to us that > current-command indicators that aren't up to date are not very useful. > So rate-limited updates strike me as a useless compromise. We have > the "real" solution (status advertised in PG's shared memory) already, > so the question in my mind is just how fast DBAs will wish to transition > to looking at "select * from pg_stat_activity" instead of looking at > "ps auxww". I don't get your argument - ps auxww is never going to be 100% up-to-date because during the time the command is running the status may change. So we already know that stats being a fraction of a second out of date are acceptable to users, because that's what may happen when you run ps in the present model. So you can use this to get away with limiting updates to e.g. 10/second and in practise no users will notice the difference. Updating thousands of times a second just on the off chance that an admin may one day run ps is completely inefficient (and has a huge overhead on non-Linux systems, so it's demonstrably not a sensible way to do things), and to the extent that there is a problem to be solved it isn't even really solving it anyway. If there really are users who find 10 proctitle updates/second an unacceptably low update rate, then tune for the default case and provide an option to allow them to override the rate limit to whatever update rate they find appropriate. Kris
Kris Kennaway <kris@obsecurity.org> writes: > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: >> Well, the thing is, we've pretty much had it handed to us that >> current-command indicators that aren't up to date are not very useful. >> So rate-limited updates strike me as a useless compromise. > I don't get your argument - ps auxww is never going to be 100% > up-to-date because during the time the command is running the status > may change. Of course. But we have already done the update-once-every-half-second bit --- that was how pg_stat_activity used to work --- and our users made clear that it's not good enough. So I don't see us expending significant effort to convert the setproctitle code path to that approach. The clear way of the future for expensive-setproctitle platforms is just to turn it off entirely and rely on the new pg_stat_activity implementation. regards, tom lane
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Gregory Stark
Date:
"Kris Kennaway" <kris@obsecurity.org> writes: > If there really are users who find 10 proctitle updates/second an > unacceptably low update rate, then tune for the default case and > provide an option to allow them to override the rate limit to whatever > update rate they find appropriate. If you rate limit the naive way you would end up with info that's arbitrarily old and out of date. To get something that's guaranteed not to be older than some maximum age we would have to start with setting timers and setting the proctitle in a signal handler which would be much more complex than what I think you're imagining. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Bruce Momjian
Date:
Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> writes: > > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: > >> Well, the thing is, we've pretty much had it handed to us that > >> current-command indicators that aren't up to date are not very useful. > >> So rate-limited updates strike me as a useless compromise. > > > I don't get your argument - ps auxww is never going to be 100% > > up-to-date because during the time the command is running the status > > may change. > > Of course. But we have already done the update-once-every-half-second > bit --- that was how pg_stat_activity used to work --- and our users > made clear that it's not good enough. So I don't see us expending > significant effort to convert the setproctitle code path to that > approach. The clear way of the future for expensive-setproctitle > platforms is just to turn it off entirely and rely on the new > pg_stat_activity implementation. 8.3 will modify less memory to update the process title than happened in the past --- perhaps that will reduce the overhead, but I doubt it. You can test CVS HEAD to check it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Maxime Henrion
Date:
Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> forwards: > > Yes but there are still a lot of wakeups to be avoided in the current > > System V semaphore code. More specifically, not only do we wakeup all > > the processes waiting on a single semaphore everytime something changes, > > but we also wakeup all processes waiting on *any* of the semaphore in > > the semaphore *set*, whatever the reason we're sleeping. Thanks for forwarding my mail, Kris! To Tom: if you can get my mails to reach pgsql-hackers@ somehow that would be just great :-). > Ohhhh ... *that's* the problem. Ugh. Although we have a separate > semaphore for each PG backend, they're grouped into semaphore sets > (I think 16 active semaphores per set). So a wakeup intended for one > process would uselessly send up to 15 others through the semop code. Yes. > The only thing we could do to fix that from our end would be to use > a smaller sema-set size on *BSD platforms. Is the overhead per sema set > small enough to make this a sane thing to do? Will we be likely to > run into system limits on the number of sets? I'm not familiar enough with the PostgreSQL code to know what impact such a change could have, but since the problem is clearly on our side here, I would advise against doing changes in PostgreSQL that are likely to complicate the code for little gain. We still didn't even fully measure how much the useless wakups cost us since we're running into other contention problems with my patch that removes those. And, as you point out, there are complications ensuing with respect to system limits (we already ask users to bump them when they install PostgreSQL). I'm looking forward fixing/rewriting all of the FreeBSD sysV semaphore code and am just waiting for a green light from my boss before doing so. Maybe someone will beat me to it, since it isn't such a big change. I think the high number of setproctitle() calls are more problematic to us at the moment, Kris can comment on that. Cheers, Maxime
Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]
From
Kris Kennaway
Date:
On Thu, Apr 12, 2007 at 12:57:32PM -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Kris Kennaway <kris@obsecurity.org> writes: > > > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: > > >> Well, the thing is, we've pretty much had it handed to us that > > >> current-command indicators that aren't up to date are not very useful. > > >> So rate-limited updates strike me as a useless compromise. > > > > > I don't get your argument - ps auxww is never going to be 100% > > > up-to-date because during the time the command is running the status > > > may change. > > > > Of course. But we have already done the update-once-every-half-second > > bit --- that was how pg_stat_activity used to work --- and our users > > made clear that it's not good enough. So I don't see us expending > > significant effort to convert the setproctitle code path to that > > approach. The clear way of the future for expensive-setproctitle > > platforms is just to turn it off entirely and rely on the new > > pg_stat_activity implementation. > > 8.3 will modify less memory to update the process title than happened in > the past --- perhaps that will reduce the overhead, but I doubt it. You > can test CVS HEAD to check it. Yeah, this is not relevant for BSD, it uses a syscall to set it (which is why it has high overhead) instead of just modifying user memory. Kris