Thread: refusing connections based on load ...
Anyone thought of implementing this, similar to how sendmail does it? If load > n, refuse connections? Basically, if great to set max clients to 256, but if load hits 50 as a result, the database is near to useless ... if you set it to 256, and 254 idle connections are going, load won't rise much, so is safe, but if half of those processes are active, it hurts ... so, if it was set so that a .conf variable could be set so that max connection == 256 *or* load > n to refuse connections, you'd hvae best of both worlds ... sendmail does it now, and, apparently relatively portable across OSs ... okay, just looked at the code, and its kinda painful, but its in src/conf.c, as a 'getla' function ... If nobody is working on something like this, does anyone but me feel that it has merit to make use of? I'll play with it if so ...
On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote: > > Anyone thought of implementing this, similar to how sendmail does it? If > load > n, refuse connections? > ... > If nobody is working on something like this, does anyone but me feel that > it has merit to make use of? I'll play with it if so ... I agree that it would be useful. Even more useful would be soft load shedding, where once some load average level is exceeded the postmaster delays a bit (proportionately) before accepting a connection. Nathan Myers ncm@zembu.com
Nathan Myers wrote: > On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote: > > > > Anyone thought of implementing this, similar to how sendmail does it? If > > load > n, refuse connections? > > ... > > If nobody is working on something like this, does anyone but me feel that > > it has merit to make use of? I'll play with it if so ... > > I agree that it would be useful. Even more useful would be soft load > shedding, where once some load average level is exceeded the postmaster > delays a bit (proportionately) before accepting a connection. Or have the load check on AtXactStart, and delay new transactions until load is back below x, where x is configurable per user/group plus some per database scaling factor. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
The Hermit Hacker <scrappy@hub.org> writes: > sendmail does it now, and, apparently relatively portable across OSs ... sendmail expects to be root. It's unlikely (and very undesirable) that postgres will be installed with adequate privileges to read /dev/kmem, which is what it'd take to run the sendmail loadaverage code on most platforms... regards, tom lane
On Mon, 23 Apr 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > sendmail does it now, and, apparently relatively portable across OSs ... > > sendmail expects to be root. It's unlikely (and very undesirable) that > postgres will be installed with adequate privileges to read /dev/kmem, > which is what it'd take to run the sendmail loadaverage code on most > platforms... Actually, not totally accurate ... sendmail has a 'RunAs' option for those that don't wish to have it run as root, and still works for the loadavg stuff, to the best of my knowledge (its an option I haven't played with yet) ...
* The Hermit Hacker <scrappy@hub.org> [010423 21:38]: > On Mon, 23 Apr 2001, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > sendmail does it now, and, apparently relatively portable across OSs ... > > > > sendmail expects to be root. It's unlikely (and very undesirable) that > > postgres will be installed with adequate privileges to read /dev/kmem, > > which is what it'd take to run the sendmail loadaverage code on most > > platforms... > > Actually, not totally accurate ... sendmail has a 'RunAs' option for those > that don't wish to have it run as root, and still works for the loadavg > stuff, to the best of my knowledge (its an option I haven't played with > yet) ... And 8.12.x will have some other options as well.... Like the SUBMISSION prog only needs to be SGID, not SUID.... LER > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
The Hermit Hacker <scrappy@hub.org> writes: > On Mon, 23 Apr 2001, Tom Lane wrote: >> sendmail expects to be root. > Actually, not totally accurate ... sendmail has a 'RunAs' option for those > that don't wish to have it run as root, True, it doesn't *have* to be root, but the loadavg code still requires privileges beyond those of mere mortals (as does listening on port 25, last I checked). On my HPUX box: $ ls -l /dev/kmem crw-r----- 1 bin sys 3 0x000001 Jun 10 1996 /dev/kmem so postgres would have to run setuid bin or setgid sys to read the load average. Either one is equivalent to giving an attacker the keys to the kingdom (overwrite a few key /usr/bin/ executables and wait for root to run one...) On Linux and BSD it seems to be more common to put /dev/kmem into a specialized group "kmem", so running postgres as setgid kmem is not so immediately dangerous. Still, do you think it's a good idea to let an attacker have open-ended rights to read your kernel memory? It wouldn't take too much effort to sniff passwords, for example. Basically, if we do this then we are abandoning the notion that Postgres runs as an unprivileged user. I think that's a BAD idea, especially in an environment that's open enough that you might feel the need to load-throttle your users. By definition you do not trust them, eh? A less dangerous way of approaching it might be to have an option whereby the postmaster invokes 'uptime' via system() every so often (maybe once a minute?) and throttles on the basis of the results. The reaction time would be poorer, but security would be a whole lot better. regards, tom lane
* Larry Rosenman <ler@lerctr.org> [010423 21:45]: > * The Hermit Hacker <scrappy@hub.org> [010423 21:38]: > > On Mon, 23 Apr 2001, Tom Lane wrote: > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > sendmail does it now, and, apparently relatively portable across OSs ... > > > > > > sendmail expects to be root. It's unlikely (and very undesirable) that > > > postgres will be installed with adequate privileges to read /dev/kmem, > > > which is what it'd take to run the sendmail loadaverage code on most > > > platforms... > > > > Actually, not totally accurate ... sendmail has a 'RunAs' option for those > > that don't wish to have it run as root, and still works for the loadavg > > stuff, to the best of my knowledge (its an option I haven't played with > > yet) ... > And 8.12.x will have some other options as well.... > > Like the SUBMISSION prog only needs to be SGID, not SUID.... Actually, the sendmail DAEMON will still have ROOT privs, so it can read /dev/kmem. I suspect I don't have as much of an issue if we are sgid kmem... LER > > LER > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 972-414-9812 E-Mail: ler@lerctr.org > US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
* Tom Lane <tgl@sss.pgh.pa.us> [010423 21:54]: > The Hermit Hacker <scrappy@hub.org> writes: > On my HPUX box: > > $ ls -l /dev/kmem > crw-r----- 1 bin sys 3 0x000001 Jun 10 1996 /dev/kmem > > so postgres would have to run setuid bin or setgid sys to read the load > average. Either one is equivalent to giving an attacker the keys to the > kingdom (overwrite a few key /usr/bin/ executables and wait for root to > run one...) On my UnixWare box it's 0440 sys.sys.... > > On Linux and BSD it seems to be more common to put /dev/kmem into a > specialized group "kmem", so running postgres as setgid kmem is not so > immediately dangerous. Still, do you think it's a good idea to let an > attacker have open-ended rights to read your kernel memory? It wouldn't > take too much effort to sniff passwords, for example. > > Basically, if we do this then we are abandoning the notion that Postgres > runs as an unprivileged user. I think that's a BAD idea, especially in > an environment that's open enough that you might feel the need to > load-throttle your users. By definition you do not trust them, eh? > > A less dangerous way of approaching it might be to have an option > whereby the postmaster invokes 'uptime' via system() every so often > (maybe once a minute?) and throttles on the basis of the results. > The reaction time would be poorer, but security would be a whole lot > better. Then there are boxes like my UnixWare one where the load average is not available AT ALL: $ uptime 10:05pm up 2 days, 3:16, 3 users $ It's a threaded kernel, and SCO/Novell/whoever has removed all traces from userland of the load average. avenrun[] is still a symbol in the kernel, but... -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
> Rather than do system('uptime') and incur the process start-up each time, > you could do fp = popen('vmstat 60', 'r'), then just read the fp. popen doesn't incur a process start? Get real. But you're right, popen() is the right call not system(), because you need to read the stdout. > I believe vmstat is fairly standard. Not more so than uptime --- and the latter's output format is definitely less variable across platforms. The HPUX man page goes so far as to say WARNINGS Users of vmstat must not rely on the exact field widths and spacing of its output, as these will vary dependingon the system, the release of HP-UX, and the data to be displayed. and that's just for *one* platform. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > On Linux and BSD it seems to be more common to put /dev/kmem into a > specialized group "kmem", so running postgres as setgid kmem is not so > immediately dangerous. Still, do you think it's a good idea to let an > attacker have open-ended rights to read your kernel memory? It wouldn't > take too much effort to sniff passwords, for example. On Linux you can get the load average by doing `cat /proc/loadavg'. On NetBSD you can get the load average via a sysctl. On those systems and others the uptime program is neither setuid nor setgid. > A less dangerous way of approaching it might be to have an option > whereby the postmaster invokes 'uptime' via system() every so often > (maybe once a minute?) and throttles on the basis of the results. > The reaction time would be poorer, but security would be a whole lot > better. That is the way to do it on systems where obtaining the load average requires special privileges. But do you really need the load average once a minute? The load average printed by uptime is just as accurate as the load average obtained by examining the kernel. Ian ---------------------------(end of broadcast)--------------------------- TIP 652: Life is a serious burden, which no thinking, humane person would wantonly inflict on someone else. -- Clarence Darrow
other then a potential buffer overrun, what would be the problem with: open(kmem) read values close(kmem) ? I would think it would be less taxing to the system then doing a system() call, but still effectively as safe, no? On Mon, 23 Apr 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > On Mon, 23 Apr 2001, Tom Lane wrote: > >> sendmail expects to be root. > > > Actually, not totally accurate ... sendmail has a 'RunAs' option for those > > that don't wish to have it run as root, > > True, it doesn't *have* to be root, but the loadavg code still requires > privileges beyond those of mere mortals (as does listening on port 25, > last I checked). > > On my HPUX box: > > $ ls -l /dev/kmem > crw-r----- 1 bin sys 3 0x000001 Jun 10 1996 /dev/kmem > > so postgres would have to run setuid bin or setgid sys to read the load > average. Either one is equivalent to giving an attacker the keys to the > kingdom (overwrite a few key /usr/bin/ executables and wait for root to > run one...) > > On Linux and BSD it seems to be more common to put /dev/kmem into a > specialized group "kmem", so running postgres as setgid kmem is not so > immediately dangerous. Still, do you think it's a good idea to let an > attacker have open-ended rights to read your kernel memory? It wouldn't > take too much effort to sniff passwords, for example. > > Basically, if we do this then we are abandoning the notion that Postgres > runs as an unprivileged user. I think that's a BAD idea, especially in > an environment that's open enough that you might feel the need to > load-throttle your users. By definition you do not trust them, eh? > > A less dangerous way of approaching it might be to have an option > whereby the postmaster invokes 'uptime' via system() every so often > (maybe once a minute?) and throttles on the basis of the results. > The reaction time would be poorer, but security would be a whole lot > better. > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On 23 Apr 2001, Ian Lance Taylor wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > On Linux and BSD it seems to be more common to put /dev/kmem into a > > specialized group "kmem", so running postgres as setgid kmem is not so > > immediately dangerous. Still, do you think it's a good idea to let an > > attacker have open-ended rights to read your kernel memory? It wouldn't > > take too much effort to sniff passwords, for example. > > On Linux you can get the load average by doing `cat /proc/loadavg'. > On NetBSD you can get the load average via a sysctl. On those systems > and others the uptime program is neither setuid nor setgid. Good call ... FreeBSD has it also, and needs no special privileges ... just checked, and the sysctl command isn't setuid/setgid anything, so I'm guessing that using sysctl() to pull these values shouldn't create any security issues on those systems that support it ?
At 03:09 PM 23-04-2001 -0300, you wrote: > >Anyone thought of implementing this, similar to how sendmail does it? If >load > n, refuse connections? > >Basically, if great to set max clients to 256, but if load hits 50 as a >result, the database is near to useless ... if you set it to 256, and 254 >idle connections are going, load won't rise much, so is safe, but if half >of those processes are active, it hurts ... Sorry, but I still don't understand the reasons why one would want to do this. Could someone explain? I'm thinking that if I allow 256 clients, and my hardware/OS bogs down when 60 users are doing lots of queries, I either accept that, or figure that my hardware/OS actually can't cope with that many clients and reduce the max clients or upgrade the hardware (or maybe do a little tweaking here and there). Why not be more deterministic about refusing connections and stick to reducing max clients? If not it seems like a case where you're promised something but when you need it, you can't have it. Cheerio, Link.
Tom Lane <tgl@sss.pgh.pa.us> writes: > > Rather than do system('uptime') and incur the process start-up each time, > > you could do fp = popen('vmstat 60', 'r'), then just read the fp. > > popen doesn't incur a process start? Get real. But you're right, popen() > is the right call not system(), because you need to read the stdout. Tom, I think the point here is that the 'vmstat' process, once started, will keep printing status output every 60 seconds (if invoked as above) so you don't have to restart it every minute, just read the pipe. > > I believe vmstat is fairly standard. > > Not more so than uptime --- and the latter's output format is definitely > less variable across platforms. The HPUX man page goes so far as to say > > WARNINGS > Users of vmstat must not rely on the exact field widths and spacing of > its output, as these will vary depending on the system, the release of > HP-UX, and the data to be displayed. > > and that's just for *one* platform. A very valid objection. I'm also dubious as to the utility of the whole concept. What happens when Sendmail refuses a message based on load? It is requeued on the sending end to be tried later. What happens when PG refuses a new client connection based on load? The application stops working. Is this really better than having slow response time because the server is thrashing? I guess my point is that Sendmail is a store-and-forward situation where the mail system can "catch up" once the load returns to normal. Whereas, I would think, the majority of PG installations want a working database, and whether it's refusing connections due to load or simply bogged down isn't going to make a difference to users that can't get their data. -Doug -- The rain man gave me two cures; he said jump right in, The first was Texas medicine--the second was just railroad gin, And like a fool I mixed them, and it strangled up my mind, Now people just get uglier, and I got no sense of time... --Dylan
On Mon, Apr 23, 2001 at 10:50:42PM -0400, Tom Lane wrote: > Basically, if we do this then we are abandoning the notion that Postgres > runs as an unprivileged user. I think that's a BAD idea, especially in > an environment that's open enough that you might feel the need to > load-throttle your users. By definition you do not trust them, eh? No. It's not a case of trust, but of providing an adaptive way to keep performance reasonable. The users may have no independent way to cooperate to limit load, but the DB can provide that. > A less dangerous way of approaching it might be to have an option > whereby the postmaster invokes 'uptime' via system() every so often > (maybe once a minute?) and throttles on the basis of the results. > The reaction time would be poorer, but security would be a whole lot > better. Yes, this alternative looks much better to me. On Linux you have the much more efficient alternative, /proc/loadavg. (I wouldn't use system(), though.) Nathan Myers ncm@zembu.com
On Tue, Apr 24, 2001 at 12:39:29PM +0800, Lincoln Yeoh wrote: > At 03:09 PM 23-04-2001 -0300, you wrote: > >Basically, if great to set max clients to 256, but if load hits 50 > >as a result, the database is near to useless ... if you set it to 256, > >and 254 idle connections are going, load won't rise much, so is safe, > >but if half of those processes are active, it hurts ... > > Sorry, but I still don't understand the reasons why one would want to do > this. Could someone explain? > > I'm thinking that if I allow 256 clients, and my hardware/OS bogs down > when 60 users are doing lots of queries, I either accept that, or > figure that my hardware/OS actually can't cope with that many clients > and reduce the max clients or upgrade the hardware (or maybe do a > little tweaking here and there). > > Why not be more deterministic about refusing connections and stick > to reducing max clients? If not it seems like a case where you're > promised something but when you need it, you can't have it. The point is that "number of connections" is a very poor estimate of system load. Sometimes a connection is busy, sometimes it's not. Some connections are busy, some are not. The goal is maximum throughput or some tradeoff of maximum throughput against latency. If system throughput varies nonlinearly with load (as it almost always does) then this happens at some particular load level. Refusing a connection and letting the client try again later can be a way to maximize throughput by keeping the system at the optimum point. (Waiting reduces delay. Yes, this is counterintuitive, but why do we queue up at ticket windows?) Delaying response, when under excessive load, to clients who already have a connection -- even if they just got one -- can have a similar effect, but with finer granularity and with less complexity in the clients. Nathan Myers ncm@zembu.com
Tom Lane writes: > The Hermit Hacker <scrappy@hub.org> writes: > > sendmail does it now, and, apparently relatively portable across OSs ... > > sendmail expects to be root. It's unlikely (and very undesirable) that > postgres will be installed with adequate privileges to read /dev/kmem, > which is what it'd take to run the sendmail loadaverage code on most > platforms... This program: #include <stdio.h> int main() { double la[3]; if (getloadavg(la, 3) == -1) perror("getloadavg"); printf("%f %f %f\n", la[0], la[1], la[2]); return 0; } works unprivileged on Linux 2.2 and FreeBSD 4.3. Rumour[*] also has it that there is a way to do this on Solaris and HP-UX 9. So I think that covers enough users to be worthwhile. [*] - Autoconf AC_FUNC_GETLOADAVG -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Apparently so under Solaris ... hestia:/> uname -a SunOS hestia 5.7 Generic_106542-12 i86pc i386 i86pc C Library Functions getloadavg(3C) NAME getloadavg - get system load averages SYNOPSIS #include <sys/loadavg.h> int getloadavg(double loadavg[], int nelem); DESCRIPTION How hard would it be to knock up code that, by default, ignores loadavg, but if, say, set in postgresql.conf: loadavg = 4 it will just refuse connections? On Tue, 24 Apr 2001, Peter Eisentraut wrote: > Tom Lane writes: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > sendmail does it now, and, apparently relatively portable across OSs ... > > > > sendmail expects to be root. It's unlikely (and very undesirable) that > > postgres will be installed with adequate privileges to read /dev/kmem, > > which is what it'd take to run the sendmail loadaverage code on most > > platforms... > > This program: > > #include <stdio.h> > > int main() > { > double la[3]; > > if (getloadavg(la, 3) == -1) > perror("getloadavg"); > > printf("%f %f %f\n", la[0], la[1], la[2]); > > return 0; > } > > works unprivileged on Linux 2.2 and FreeBSD 4.3. Rumour[*] also has it > that there is a way to do this on Solaris and HP-UX 9. So I think that > covers enough users to be worthwhile. > > [*] - Autoconf AC_FUNC_GETLOADAVG > > -- > Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Doug McNaught wrote: > A very valid objection. I'm also dubious as to the utility of the > whole concept. What happens when Sendmail refuses a message based on > load? It is requeued on the sending end to be tried later. What > happens when PG refuses a new client connection based on load? The > application stops working. Is this really better than having slow > response time because the server is thrashing? That's exactly the point why I suggested to delay transaction starts instead. The client app allways gets the connection. Doing dialog steps inside of open transactions is allways a bad design, leading to a couple of problems (coffee break with open locks), so we can assume that if an application starts a transaction, it'll keepthis one backend as busy as possible until the transactions end. Processing too many transactions parallel is what get's the system into heavy swapping and exponential usage of resources. So if we delay starting transactions if the system load is above the limit, we probably speedupthe overall per transaction response time, increasing the througput. And that's what this discussion isall about, no? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Doug McNaught writes: > A very valid objection. I'm also dubious as to the utility of the > whole concept. What happens when Sendmail refuses a message based on > load? It is requeued on the sending end to be tried later. What > happens when PG refuses a new client connection based on load? The > application stops working. Is this really better than having slow > response time because the server is thrashing? The concept is just as dubious as the concept of rejecting clients based on how many clients are already connected. There are some technical reasons for the latter, but it is still used as an administrative tool. The rule is, if you don't like it, don't use it. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
At 10:59 PM 23-04-2001 -0700, Nathan Myers wrote: >On Tue, Apr 24, 2001 at 12:39:29PM +0800, Lincoln Yeoh wrote: >> Why not be more deterministic about refusing connections and stick >> to reducing max clients? If not it seems like a case where you're >> promised something but when you need it, you can't have it. > >The point is that "number of connections" is a very poor estimate of >system load. Sometimes a connection is busy, sometimes it's not. Actually I use number of connections to estimate how much RAM I will need, not for estimating system load. Because once the system runs out of RAM, performance drops a lot. If I can prevent the system running out of RAM, it can usually take whatever I throw at it at near the max throughput. For my app say the max is X hits per second with a few concurrent transactions. When I boost it the number of concurrent transactions (e.g. 25 on a 128MB machine, load~13) it goes down to maybe 0.95X hits per second[1]. This is acceptable to me. But once the machine starts swapping, things bog down drastically and some connections get Server Error. >Refusing a connection and letting the client try again later can be >a way to maximize throughput by keeping the system at the optimum >point. (Waiting reduces delay. Yes, this is counterintuitive, but >why do we queue up at ticket windows?) > >Delaying response, when under excessive load, to clients who already >have a connection -- even if they just got one -- can have a similar >effect, but with finer granularity and with less complexity in the >clients. With my web apps, refusing connection based on load doesn't help at all, they are fastcgi processes and are already holding database connections open, before even getting a web request ( might as well open the db connection before the client talks to you). For other apps maybe refusing connection could help. But are these cases in the majority? In say a bank teller environment, the database connections are probably already open, and could remain open the whole day. Delaying transactions based on load is easier to understand for me. Cheerio, Link. [1] This is a guesstimate: the hits per second drops gradually during the benchmark. The speed for a low concurrent test run AFTER the benchmark had a slower hits per second than the benchmark figures. This is probably because there was a lot of selecting and updating of the same row, and Postgresql needs a vacuum before the speed goes back up. Seems like the dead rows get in the way of the index or something - speed doesn't slow down as much for lots of inserts and selects.
On Wed, 25 Apr 2001, Lincoln Yeoh wrote: > At 10:59 PM 23-04-2001 -0700, Nathan Myers wrote: > >On Tue, Apr 24, 2001 at 12:39:29PM +0800, Lincoln Yeoh wrote: > >> Why not be more deterministic about refusing connections and stick > >> to reducing max clients? If not it seems like a case where you're > >> promised something but when you need it, you can't have it. > > > >The point is that "number of connections" is a very poor estimate of > >system load. Sometimes a connection is busy, sometimes it's not. > > Actually I use number of connections to estimate how much RAM I will need, > not for estimating system load. > > Because once the system runs out of RAM, performance drops a lot. If I can > prevent the system running out of RAM, it can usually take whatever I throw > at it at near the max throughput. I have a Dual-866, 1gig of RAM and strip'd file systems ... this past week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and disks are pretty much sitting idle ... It turns out, in this case, that vacuum was in order (i vacuum 12x per day now instead of 6), so that now it will run with 300 simultaneous connections, but with a loadavg of 68 or so, 300 connections are just building on each other to slow the rest down :(
At 11:28 PM 24-04-2001 -0300, The Hermit Hacker wrote: > >I have a Dual-866, 1gig of RAM and strip'd file systems ... this past >week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and >disks are pretty much sitting idle ... > >It turns out, in this case, that vacuum was in order (i vacuum 12x per day >now instead of 6), so that now it will run with 300 simultaneous >connections, but with a loadavg of 68 or so, 300 connections are just >building on each other to slow the rest down :( > Hmm then maybe we should refuse connections based on "need to vacuum"... :). Seriously though does the _total_ work throughput go down significantly when you have high loads? I got a load 13 with 25 concurrent connections (not much), and yeah things took longer but the hits per second wasn't very much different from the peak possible with fewer connections. Basically in my case almost the same amount of work is being done per second. So maybe higher loads might be fine on your more powerful system? Cheerio, Link.
On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote: > I have a Dual-866, 1gig of RAM and strip'd file systems ... this past > week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and > disks are pretty much sitting idle ... Assuming "strip'd" above means "striped", it strikes me that you might be much better off operating the drives independently, with the various tables, indexes, and logs scattered each entirely on one drive. That way the heads can move around independently reading and writing N blocks, rather than all moving in concert reading or writing only one block at a time. (Striping the WAL file on a couple of raw devices might be a good idea along with the above. Can we do that?) But of course speculation is much less useful than trying it. Some measurements before and after would be really, really interesting to many of us. Nathan Myers ncm@zembu.com
On Tue, 24 Apr 2001, Nathan Myers wrote: > On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote: > > I have a Dual-866, 1gig of RAM and strip'd file systems ... this past > > week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and > > disks are pretty much sitting idle ... > > Assuming "strip'd" above means "striped", it strikes me that you > might be much better off operating the drives independently, with > the various tables, indexes, and logs scattered each entirely on one > drive. have you ever tried to maintain a database doing this? PgSQL is definitely not designed for this sort of setup, I had symlinks goign everywhere,a nd with the new numbering schema, this is even more difficult to try and do :)
The whole argument over how to get load averages seems rather silly, and it's moot if the idea of using the load information to alter PG behavior is rejected. I personally have no use for it, but I don't think it's a bad idea in general. Particularly given future redundancy/load sharing features. On the other hand, I think almost all of this stuff can and should be done outside of postmaster. Here is the 0-change version, for rejecting connections, and for operating systems that have built-in firewall capability, such as FreeBSD: a standalone daemon that adds a reject rule for the Postgres port when the load gets too high, and drops that rule when the load goes back down. Now here's the small-change version: add support to Postgres for a SET command or similar way to say "stop accepting connections", or "set accept/transaction delay to X". Write a standalone daemon which monitors the load and issues commands to Postgres as necessary. That daemon may need extra privileges, but it is small, auditable, and doesn't talk to the outside world. It's probably better to include in the Postgres protocol support for accepting (TCP-wise) a connection, then closing it with an error message, because this daemon needs to be able to connect to tell it to let users in again. It's probably as simple as always letting the superuser in. The latter is nicer in a number of ways. Persistent connections were already mentioned - rejecting new connections may not be a good enough solution there. With a fancier approach, you could even hang up on some existing connections with an appropriate message, or just NOTICE them that you're slowing them down or you'd like them to go away voluntarily. From a web-hosting standpoint, someday it would be nifty to have per-user-per-connection limits, so I could put up a couple of big PG servers and only allow user X one connection, which can't use more than Y amount of RAM, and passes a scheduling hint to the OS so it shares CPU time with other economy-class users, which can be throttled down to 25% of what ultra-mega-hosting users get. Simple load shedding is a baby step in the right direction. If nothing else, it will cast a spotlight on some of the problem areas. -- Christopher Masto Senior Network Monkey NetMonger Communications chris@netmonger.net info@netmonger.net http://www.netmonger.net Free yourself, free your machine, free the daemon -- http://www.freebsd.org/
Jan Wieck and I talked about this for awhile yesterday, and we came to the conclusion that load-average-based throttling is a Bad Idea. Quite aside from the portability and permissions issues that may arise in getting the numbers, the available numbers are the wrong thing: (1) On most Unix systems, the finest-grain load average that you can get is a 1-minute average. This will lose both on the ramp up (by the time you realize you overdid it, you've let *way* too many xacts through the starting gate) and on the ramp down (you'll hold off xacts for circa a minute after the crunch is past). (2) You can also get shorter-time-frame CPU usage numbers (at least, most versions of top(1) seem to display such things) but CPU load is really not very helpful for measuring how badly the system is thrashing. Postgres tends to beat your disks into the ground long before it pegs the CPU. Too bad there's no "disk usage" numbers. However, there is another possibility that would be simple to implement and perfectly portable: allow the dbadmin to impose a limit on the number of simultaneous concurrent transactions. (Setting this equal to the max allowed number of backends would turn off the limit.) That way, you could have umpteen open connections, but you could limit how many of them were actually *doing* something at any given instant. If more than N try to start transactions at the same time, the later ones have to wait for the earlier ones to finish before they can start. This'd be trivial to do with a semaphore initialized to N --- P() it in StartTransaction and V() it in Commit/AbortTransaction. A conncurrent-xacts limit isn't perfect of course, but I think it'd be pretty good, and certainly better than anything based on the available load-average numbers. regards, tom lane
Tom Lane writes: > A conncurrent-xacts limit isn't perfect of course, but I think it'd > be pretty good, and certainly better than anything based on the > available load-average numbers. The concurrent transaction limit would allow you to control the absolute load of the PostgreSQL server, but we can already do that and it's not what we're after here. The idea behind the load average based approach is to make the postmaster respect the situation of the overall system. Additionally, the concurrent transaction limit would only be useful on setups that have a lot of idle transactions. Those setups exist, but not everywhere. To me, both of these approaches are in the "if you don't like it, don't use it" category. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
On Wed, 25 Apr 2001, Peter Eisentraut wrote: > Tom Lane writes: > > > A conncurrent-xacts limit isn't perfect of course, but I think it'd > > be pretty good, and certainly better than anything based on the > > available load-average numbers. > > The concurrent transaction limit would allow you to control the absolute > load of the PostgreSQL server, but we can already do that and it's not > what we're after here. The idea behind the load average based approach is > to make the postmaster respect the situation of the overall system. > Additionally, the concurrent transaction limit would only be useful on > setups that have a lot of idle transactions. Those setups exist, but not > everywhere. > > To me, both of these approaches are in the "if you don't like it, don't > use it" category. Agreed ... by default, the loadavg method could be set to zero, to ignore ... I don't care if I'm off by 1min before I catch the increase, the fact is that I have caught it, and prevent any new ones coming in until it drops off again ... Make it two variables: transla rejectla if transla is hit, restrict on transactions, letting others connect, but putting them on hold while the la drops again ... if it goes above rejectla, refuse new connections altogether ... so now I can set something like: transla = 8 rejectla = 16 but if loadavg goes above 16, I want to get rid of what is causing the load to rise *before* adding new variables to the mix that will cause it to rise higher ... and your arg about permissions (Tom's, not Peter's) is moot in at least 3 of the major systems (Linux, *BSD and Solaris) as there is a getloadavg() function in all three for doing this ...
Peter Eisentraut <peter_e@gmx.net> writes: > The idea behind the load average based approach is > to make the postmaster respect the situation of the overall system. That'd be great if we could do it, but as I pointed out, the available stats do not allow us to do it very well. I think this will create a lot of portability headaches for no real gain. If it were something we could just do and forget, I would not object --- but the porting issues will create a LOT more work than I think this can possibly be worth. The fact that the work is distributed and will mostly be incurred by people other than the ones advocating the change doesn't improve matters. regards, tom lane
On Wed, Apr 25, 2001 at 09:41:57AM -0300, The Hermit Hacker wrote: > On Tue, 24 Apr 2001, Nathan Myers wrote: > > > On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote: > > > I have a Dual-866, 1gig of RAM and strip'd file systems ... this past > > > week, I've hit many times where CPU usage is 100%, RAM is 500Meg free > > > and disks are pretty much sitting idle ... > > > > Assuming "strip'd" above means "striped", it strikes me that you > > might be much better off operating the drives independently, with > > the various tables, indexes, and logs scattered each entirely on one > > drive. > > have you ever tried to maintain a database doing this? PgSQL is > definitely not designed for this sort of setup, I had symlinks going > everywhere, and with the new numbering schema, this is even more > difficult to try and do :) Clearly you need to build a tool to organize it. It would help a lot if PG itself could provide some basic assistance, such as calling a stored procedure to generate the pathname of the file. Has there been any discussion of anything like that? Nathan Myers ncm@zembu.com
The Hermit Hacker wrote: > Agreed ... by default, the loadavg method could be set to zero, to ignore > ... I don't care if I'm off by 1min before I catch the increase, the fact > is that I have caught it, and prevent any new ones coming in until it > drops off again ... > > Make it two variables: > > transla > rejectla > > if transla is hit, restrict on transactions, letting others connect, but > putting them on hold while the la drops again ... if it goes above > rejectla, refuse new connections altogether ... > > so now I can set something like: > > transla = 8 > rejectla = 16 > > but if loadavg goes above 16, I want to get rid of what is causing the > load to rise *before* adding new variables to the mix that will cause it > to rise higher ... > > and your arg about permissions (Tom's, not Peter's) is moot in at least 3 > of the major systems (Linux, *BSD and Solaris) as there is a getloadavg() > function in all three for doing this ... I've just recompiled my php4 module to get sysvsem support and limited the number of concurrent DB transactions on the application level. The (not yet finished) TPC-C implementation I'm working on scales about 3-4 times better now. That's an improvement! This proves that limiting the number of concurrently running transactions is sufficient to keep the system load down. Combined these two look as follows: - We start with a fairly high setting in the semaphore. - When the system load exceeds the high-watermark, we don't increment the semaphore back after transaction end (need to ensure that at least a small minimum of xacts is left, but that's easy). - When the system goes back to normal load level, we slowly increase the semaphore again. This way we might have some peek pushing the system against the wall for a moment. If that doesn't go away quickly, we just delay users (who see some delay anyway actually). Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
The Hermit Hacker <scrappy@hub.org> writes: > Autoconf has a 'LOADAVG' check already, so what is so problematic about > using that to enabled/disable that feature? Because it's tied to a GNU getloadavg.c implementation, which we'd have license problems with using. regards, tom lane
Jan Wieck <JanWieck@yahoo.com> writes: > This proves that limiting the number of concurrently running > transactions is sufficient to keep the system load down. > Combined these two look as follows: > - We start with a fairly high setting in the semaphore. > - When the system load exceeds the high-watermark, we don't > increment the semaphore back after transaction end (need > to ensure that at least a small minimum of xacts is left, > but that's easy). > - When the system goes back to normal load level, we slowly > increase the semaphore again. This is a nice way of dealing with the slow reaction time of the load average --- you don't let it directly drive the decision about when to start a new transaction, but instead let it tweak the ceiling on number of concurrent xacts. I like it. You probably don't need to have any additional "slowness" in the loop other than the inherent averaging in the kernel's load average. I'm still concerned about portability issues, and about whether load average is really the right number to be looking at, however. regards, tom lane
On Wed, 25 Apr 2001, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > The idea behind the load average based approach is > > to make the postmaster respect the situation of the overall system. > > That'd be great if we could do it, but as I pointed out, the available > stats do not allow us to do it very well. > > I think this will create a lot of portability headaches for no real > gain. If it were something we could just do and forget, I would not > object --- but the porting issues will create a LOT more work than > I think this can possibly be worth. The fact that the work is > distributed and will mostly be incurred by people other than the ones > advocating the change doesn't improve matters. As I mentioned, getloadavg() appears to be support on 3 of the primary platforms we work with, so I'd say for most installations, portability issues aren't an issue ... Autoconf has a 'LOADAVG' check already, so what is so problematic about using that to enabled/disable that feature? If ( loadavg available on OS && enabled in postgresql.conf ) operate on it } else ( loadavg not available on OS && enabled ) noop with a WARN level error that its not available }
On Wed, 25 Apr 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > Autoconf has a 'LOADAVG' check already, so what is so problematic about > > using that to enabled/disable that feature? > > Because it's tied to a GNU getloadavg.c implementation, which we'd have > license problems with using. It's part of the standard C library in FreeBSD. Any other platforms have it built in? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Wed, 25 Apr 2001, Vince Vielhaber wrote: > On Wed, 25 Apr 2001, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > Autoconf has a 'LOADAVG' check already, so what is so problematic about > > > using that to enabled/disable that feature? > > > > Because it's tied to a GNU getloadavg.c implementation, which we'd have > > license problems with using. > > It's part of the standard C library in FreeBSD. Any other platforms > have it built in? As has been mentioned, Solaris and Linux also have it ...
On Wed, 25 Apr 2001, Tom Lane wrote: > I'm still concerned about portability issues, and about whether load > average is really the right number to be looking at, however. Its worked for Sendmail for how many years now, and the code is there to use, with all "portability issues resolved for every platform they use ... and a growing number of platforms appear to have the mechanisms already built into their C libraries ...
On Wed, 25 Apr 2001, The Hermit Hacker wrote: > On Wed, 25 Apr 2001, Vince Vielhaber wrote: > > > On Wed, 25 Apr 2001, Tom Lane wrote: > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > Autoconf has a 'LOADAVG' check already, so what is so problematic about > > > > using that to enabled/disable that feature? > > > > > > Because it's tied to a GNU getloadavg.c implementation, which we'd have > > > license problems with using. > > > > It's part of the standard C library in FreeBSD. Any other platforms > > have it built in? > > As has been mentioned, Solaris and Linux also have it ... But what's in FreeBSD's standard library isn't GNU. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Thu, 26 Apr 2001, Vince Vielhaber wrote: > On Wed, 25 Apr 2001, The Hermit Hacker wrote: > > > On Wed, 25 Apr 2001, Vince Vielhaber wrote: > > > > > On Wed, 25 Apr 2001, Tom Lane wrote: > > > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > > Autoconf has a 'LOADAVG' check already, so what is so problematic about > > > > > using that to enabled/disable that feature? > > > > > > > > Because it's tied to a GNU getloadavg.c implementation, which we'd have > > > > license problems with using. > > > > > > It's part of the standard C library in FreeBSD. Any other platforms > > > have it built in? > > > > As has been mentioned, Solaris and Linux also have it ... > > But what's in FreeBSD's standard library isn't GNU. Wouldn't matter if it was, its part of the OSs standard library ... unless you mean to pull it in and use it with the distribution, which I think might be a bad idea ... if we pull anything in, sendmail's would be best ... FreeBSD's will have had anything required for non-FreeBSD systems yanked out, if it was ever there, while sendmail's already has all the 'hooks' in it ...
On Thu, 26 Apr 2001, The Hermit Hacker wrote: > On Thu, 26 Apr 2001, Vince Vielhaber wrote: > > > On Wed, 25 Apr 2001, The Hermit Hacker wrote: > > > > > On Wed, 25 Apr 2001, Vince Vielhaber wrote: > > > > > > > On Wed, 25 Apr 2001, Tom Lane wrote: > > > > > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > > > Autoconf has a 'LOADAVG' check already, so what is so problematic about > > > > > > using that to enabled/disable that feature? > > > > > > > > > > Because it's tied to a GNU getloadavg.c implementation, which we'd have > > > > > license problems with using. > > > > > > > > It's part of the standard C library in FreeBSD. Any other platforms > > > > have it built in? > > > > > > As has been mentioned, Solaris and Linux also have it ... > > > > But what's in FreeBSD's standard library isn't GNU. > > Wouldn't matter if it was, its part of the OSs standard library ... unless > you mean to pull it in and use it with the distribution, which I think > might be a bad idea ... if we pull anything in, sendmail's would be best > ... FreeBSD's will have had anything required for non-FreeBSD systems > yanked out, if it was ever there, while sendmail's already has all the > 'hooks' in it ... That wasn't what I was saying at all. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
>Nathan Myers wrote: >> On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote: >> > >> > Anyone thought of implementing this, similar to how sendmail does it? If >> > load > n, refuse connections? >> > ... >> > If nobody is working on something like this, does anyone but me feel that >> > it has merit to make use of? I'll play with it if so ... >> >> I agree that it would be useful. Even more useful would be soft load >> shedding, where once some load average level is exceeded the postmaster >> delays a bit (proportionately) before accepting a connection. > > Or have the load check on AtXactStart, and delay new > transactions until load is back below x, where x is > configurable per user/group plus some per database scaling > factor. How is this different than limiting the number of backends that can be running at once? It would seem to me that a user that has a "delayed" startup is going to think there's something wrong with the server and keep trying, where as a message like "too many clients - try again later" explains what's really going on. len morgan
The soft load shedding idea is great. Along the lines of "lots of idle connections" is the issue with the simple number of connections. I suspect in most real world apps you'll have logic+web serving on a set of frontends talking to a single db backend (until clustering is really nailed). The issue we hit is that if we all the frontends have 250 maxclients, the number on the backend goes way up. This falls in the connection pooling realm, and could be implemented with the client lib presenting a server view, so apps would simply treat the pooler as a local server which would allocate connections as needed from a pool of persistent connections. This also has a benefit in cases (cgi) where persistent connections cannot be maintained properly. I suspect we've got a 10% duty cycle on the persistent connections we set up... This problem is predicated on the idea that holding a connection is not negligible (i.e., 5,000 connections open is worse than 200) for the same loads. Not sure if that's the case... AZ "Nathan Myers" <ncm@zembu.com> wrote in message news:20010423121105.Y3797@store.zembu.com... > On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote: > > > > Anyone thought of implementing this, similar to how sendmail does it? If > > load > n, refuse connections? > > ... > > If nobody is working on something like this, does anyone but me feel that > > it has merit to make use of? I'll play with it if so ... > > I agree that it would be useful. Even more useful would be soft load > shedding, where once some load average level is exceeded the postmaster > delays a bit (proportionately) before accepting a connection. > > Nathan Myers > ncm@zembu.com > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
Tom Lane wrote: > A less dangerous way of approaching it might be to have an option > whereby the postmaster invokes 'uptime' via system() every so often > (maybe once a minute?) and throttles on the basis of the results. > The reaction time would be poorer, but security would be a whole lot > better. Rather than do system('uptime') and incur the process start-up each time, you could do fp = popen('vmstat 60', 'r'), then just read the fp. I believe vmstat is fairly standard. For those systems which don't support vmstat, it could be faked with a shell script. You could write the specific code to handle each arch, but it's a royal pain, because it's so different for many archs. Another possibility could be to read from /proc for those systems that support /proc. But I think this will be more variable than the output from vmstat. Vmstat also has the added benefit of providing other information. I agree with Tom about not wanting to open up /dev/kmem, due to potential security problems. Neal
Vince Vielhaber <vev@michvhf.com> writes: > On Wed, 25 Apr 2001, The Hermit Hacker wrote: >> On Wed, 25 Apr 2001, Vince Vielhaber wrote: >> > On Wed, 25 Apr 2001, Tom Lane wrote: > Because it's tied to a GNU getloadavg.c implementation, which we'd have > license problems with using. > > It's part of the standard C library in FreeBSD. Any other platforms > have it built in? >> >> As has been mentioned, Solaris and Linux also have it ... > But what's in FreeBSD's standard library isn't GNU. Obviously I confused some people. What Autoconf's LOADAVG macro actually does is (1) check to see if system has a getloadavg() library routine, and if so, set up to use that. Otherwise(2) apply a bunch of ad-hoc checks to find out whether a GNU-specific getloadavg module can be used. That moduleisn't actually included with autoconf; I imagine the one they have in mind is the one in GNU make. Therefore, Autoconf's macro is useless to us as a means of configuring load average support, because we won't be using GNU make's getloadavg module. The Sendmail loadavg code should be more friendly from a licensing standpoint, but IT HAS PRIVILEGE PROBLEMS. Reading /dev/kmem isn't something that we should expect to be able to do in Postgres. In short, I haven't seen any evidence that we have a portable solution available. Please don't reply (yet again) "It works on $MYSYSTEM, therefore there's no problem." If you want to implement this feature then you need to take responsibility for making it work everywhere. regards, tom lane