Thread: limiting resources to users
Hi:
Is there a way in Postgres to limit how much cpu, memory, other resources a certain DB gets? A MySQL DB that I’m now replacing with PG has been having problems with run-away users that pound one DB with intense processes running periodically in cron jobs. The effect is that it takes up all the resources and the users of other DBs suffer because the CPU is pegged servicing the first guy..
The PG versio will have one PG instance with many DBs. One set of these DBs are vital for operations and should have the highest priority. Another set of DBs are not vital but their users are the culprits that consume all the resources. I want to give DBs A, B, and C the highest priority, DBs D, E, F the lowest. If a DB F user is taking up all the resources, and a DB B users runs a query, I want to service the DB B guys first and backburner the DB F guy until the DB B guy is serviced first.
Thanks !
Gauthier, Dave wrote: > > Hi: > > Is there a way in Postgres to limit how much cpu, memory, other > resources a certain DB gets? A MySQL DB that I’m now replacing with PG > has been having problems with run-away users that pound one DB with > intense processes running periodically in cron jobs. The effect is > that it takes up all the resources and the users of other DBs suffer > because the CPU is pegged servicing the first guy.. > > The PG versio will have one PG instance with many DBs. One set of > these DBs are vital for operations and should have the highest > priority. Another set of DBs are not vital but their users are the > culprits that consume all the resources. I want to give DBs A, B, and > C the highest priority, DBs D, E, F the lowest. If a DB F user is > taking up all the resources, and a DB B users runs a query, I want to > service the DB B guys first and backburner the DB F guy until the DB B > guy is serviced first. > on Solaris 10, I'd manage that with zones, running different instances in different zones... you'd also want to segregate the priority database on a separate raid volume to remove IO contention.
Gauthier, Dave a écrit : > Hi: > > > > Is there a way in Postgres to limit how much cpu, memory, other > resources a certain DB gets? A MySQL DB that I’m now replacing with PG > has been having problems with run-away users that pound one DB with > intense processes running periodically in cron jobs. The effect is that > it takes up all the resources and the users of other DBs suffer because > the CPU is pegged servicing the first guy.. > > > > The PG versio will have one PG instance with many DBs. One set of these > DBs are vital for operations and should have the highest priority. > Another set of DBs are not vital but their users are the culprits that > consume all the resources. I want to give DBs A, B, and C the highest > priority, DBs D, E, F the lowest. If a DB F user is taking up all the > resources, and a DB B users runs a query, I want to service the DB B > guys first and backburner the DB F guy until the DB B guy is serviced first. Buy a chainsaw and cut one hand from each user (Ok --->[]) May be you can play with renice (or even with the -N switch in start-stop-daemon, but I don't know if it is a regular package or just a Debian one). As nice allow running from -20 to +20 priority, it shall give you the pliability you need: as soon as the higher reniced daemon will claim power the system will grant him against other process. JY -- There is no sincerer love than the love of food. -- George Bernard Shaw
In response to "Gauthier, Dave" <dave.gauthier@intel.com>: > Hi: > > Is there a way in Postgres to limit how much cpu, memory, other resources a certain DB gets? A MySQL DB that I'm now replacingwith PG has been having problems with run-away users that pound one DB with intense processes running periodicallyin cron jobs. The effect is that it takes up all the resources and the users of other DBs suffer because theCPU is pegged servicing the first guy.. > > The PG versio will have one PG instance with many DBs. One set of these DBs are vital for operations and should have thehighest priority. Another set of DBs are not vital but their users are the culprits that consume all the resources. I want to give DBs A, B, and C the highest priority, DBs D, E, F the lowest. If a DB F user is taking up all the resources,and a DB B users runs a query, I want to service the DB B guys first and backburner the DB F guy until the DB Bguy is serviced first. You can try various tricks with nice. Have you conclusively determined that the bottlenecked resource is CPU? In my experience, it's much more likely to be disk IO, and nice doesn't help with that. I don't believe there's any way to control this just using PostgreSQL. One thing to do is to school the users who are hogging resources. Cancel their queries and send them apologetic emails about how you're sorry but you have to allow other people to use the system as well. Hopefully, you'll get them to reconsider their queries and improve the efficiency. Again, based on my experience, those queries that are hogging the system can probably be significantly optimized. However, if you don't put pressure on the users to do so, they're not going to bother. A better solution would probably be to buy a second database server. Put the mission-critical DBs on one physical system and the rest on another system and your problem goes away. Honestly, you might want to consider this anyway, since it seems like your existing system is nearing overload if it can't maintain reasonable performance during concurrent loads. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran a écrit : > In response to "Gauthier, Dave" <dave.gauthier@intel.com>: > >> Hi: >> >> Is there a way in Postgres to limit how much cpu, memory, other resources a >>certain DB gets? A MySQL DB that I'm now replacing with PG has been having >>problems with run-away users that pound one DB with intense processes running >>periodically in cron jobs. The effect is that it takes up all the resources >> and the users of other DBs suffer because the CPU is pegged servicing the first guy.. >> >> The PG versio will have one PG instance with many DBs. One set of these DBs are vital for operations and should havethe highest priority. Another set of DBs are not vital but their users are the culprits that consume all the resources. I want to give DBs A, B, and C the highest priority, DBs D, E, F the lowest. If a DB F user is taking up allthe resources, and a DB B users runs a query, I want to service the DB B guys first and backburner the DB F guy untilthe DB B guy is serviced first. > > You can try various tricks with nice. Have you conclusively determined that > the bottlenecked resource is CPU? In my experience, it's much more likely > to be disk IO, and nice doesn't help with that. I agree, unless you use SCSI 640 with SSD front cache, HDz are always the bottleneck. ... > A better solution would probably be to buy a second database server. Put > the mission-critical DBs on one physical system and the rest on another > system and your problem goes away. Honestly, you might want to consider > this anyway, since it seems like your existing system is nearing overload > if it can't maintain reasonable performance during concurrent loads. or may be using virtual machines (with a minimum daemons running on each), this way you'll be able to fine tweak each server (and also each Qty of RAM), but you'll need much power available, as usual YMMV. JY -- BOHICA: Bend over, here it comes again.
no way for PostgreSQL to have something like Profiles to limite ressource usage for users? Is this feature planned for next version? 8-) Jean-Yves F. Barbier a écrit : > Gauthier, Dave a écrit : > >> Hi: >> >> >> >> Is there a way in Postgres to limit how much cpu, memory, other >> resources a certain DB gets? A MySQL DB that I’m now replacing with PG >> has been having problems with run-away users that pound one DB with >> intense processes running periodically in cron jobs. The effect is that >> it takes up all the resources and the users of other DBs suffer because >> the CPU is pegged servicing the first guy.. >> >> >> >> The PG versio will have one PG instance with many DBs. One set of these >> DBs are vital for operations and should have the highest priority. >> Another set of DBs are not vital but their users are the culprits that >> consume all the resources. I want to give DBs A, B, and C the highest >> priority, DBs D, E, F the lowest. If a DB F user is taking up all the >> resources, and a DB B users runs a query, I want to service the DB B >> guys first and backburner the DB F guy until the DB B guy is serviced first. >> > > Buy a chainsaw and cut one hand from each user (Ok --->[]) > > > May be you can play with renice (or even with the -N switch in > start-stop-daemon, but I don't know if it is a regular package > or just a Debian one). > > As nice allow running from -20 to +20 priority, it shall give you > the pliability you need: as soon as the higher reniced daemon will > claim power the system will grant him against other process. > > JY >
Gauthier, Dave wrote: > Is there a way in Postgres to limit how much cpu, memory, other > resources a certain DB gets? A MySQL DB that I'm now replacing with PG > has been having problems with run-away users that pound one DB with > intense processes running periodically in cron jobs. The effect is that > it takes up all the resources and the users of other DBs suffer because > the CPU is pegged servicing the first guy. As others have noticed, it may not just be CPU. I/O is usually the bottleneck in a DBMS. CPU usage may show up, but if it's shown as "IOWait" then it's actually time a process spends waiting for disk I/O, not "real" CPU time used. Whether the issue is with disk I/O or with CPU, PostgreSQL does not have anything much in the way of facilities to set priorities for users or queries. For CPU, as others have noticed, you can use "nice", but it's a bit clumsy. Personally I'd recommend installing a C-language function "nice_backend(prio)" that renices the calling backend to "prio". Limit "prio" to >= 0, and provide a second function "nice_backend_super(pid,prio)" that can renice any backend pid and set any priority, but is usable only by the superuser. I don't know of any canned implementations of these, but it shouldn't be hard to whip them up based on the C user defined function documentation and examples, using the nice(2) system call to do the dirty work. I/O is harder. Some operating systems offer I/O priorities for processes, like Linux's ionice, and you'd think you could use these in a similar way to how you use 'nice'. Unfortunately, that won't work well, because a lot of the work PostgreSQL does - especially disk writes - are done via a helper background writer process working from memory shared by all backends. Similarly, the write-ahead logs are managed by their own process via shared memory. So it's very hard to effectively give one user priority over another for writes. ionice should be somewhat effective for reads, though. Additionally, PostgreSQL doesn't have any notion of locking priority. A "higher priority" transaction has no way to automatically terminate a lower priority transaction that's doing slow work while holding a required lock. You can manually terminate the problem process with pg_cancel_backend(...) after identifying that it holds desired locks, but this is painful to say the least. This issue only comes up where the transactions are running in the same database, not just the same cluster, but it can be a pain. Since it sounds like your priorities are per-database not per-user or per-query, it probably doesn't matter to you, but I thought I'd mention it. > The PG versio will have one PG instance with many DBs. One set of > these DBs are vital for operations and should have the highest priority. The best bet here is to have multiple PostgreSQL clusters running on the machine - if you can't have separate hardware. Have one cluster (ie: a postmaster with its own listening port, data dir, write-ahead logs, etc) set to a higher CPU and I/O priority than the other when it starts up. Those priorities will be inherited by child backends, so connections to that cluster will have a higher priority on system resources. Do the same for the badly behaved users - put them in their own cluster, and renice them down for lower priority access to CPU and I/O resources. You could even be mean and (if running on Linux) put them in the "Idle" I/O priority, though that'd probably result in *awful* performance. The downside of separating DBs into multiple clusters grouped by priority is that each cluster runs on a different port, it must be backed up separately, all tuning and configuration must be done separately - including user management - etc. So there's more work involved. Unfortunately, you must also allocate separate blocks of shared memory to the clusters. Even when the high priority cluster is idle, the low priority one will not be able to use its shared memory for caching, and vice versa. So you may need more RAM. -- Craig Ringer
Craig Ringer wrote: > Gauthier, Dave wrote: > >> Is there a way in Postgres to limit how much cpu, memory, other >> resources a certain DB gets? A MySQL DB that I'm now replacing with PG >> has been having problems with run-away users that pound one DB with >> intense processes running periodically in cron jobs. The effect is that >> it takes up all the resources and the users of other DBs suffer because >> the CPU is pegged servicing the first guy. > I've tidied up my previous explanation into a wiki page, since it'll come up again. Your comments and suggestions would be appreciated: http://wiki.postgresql.org/wiki/Prioritizing_databases_by_separating_into_multiple_clusters I've also added a general explanation of the issues with prioritizing users/queries/connections/databases: http://wiki.postgresql.org/wiki/Priorities Both pages are very much draft quality. Comments, edits, etc would be most welcome. -- Craig Ringer
Actually, limiting resources on a per DB basis would work for me too. I thin kOracle had a thing where you could limit resources, or at least prioritize users. Someone else was reporting this to me, and again, it was a MySQL DB that I'll eventually be replacing with PG. I'll inheritthis problem too, so I'm being proactive in looking for a fix. He said that the cpu was pegged (no mention of I/O).He started killing processes left and write until performace was back (it was either that or a DB restart to clear theslate). MAny users use cron jobs to "refresh" reports and they're not squemish about refreshing very often. What I'd like to seeis that they get their 3% of the server's attention to di this and no more, leaving the other 97% free for non-abusiveusers. Thanks for all the help on this !! -dave -----Original Message----- From: Craig Ringer [mailto:craig@postnewspapers.com.au] Sent: Wednesday, November 25, 2009 8:43 PM To: Gauthier, Dave Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] limiting resources to users Gauthier, Dave wrote: > Is there a way in Postgres to limit how much cpu, memory, other > resources a certain DB gets? A MySQL DB that I'm now replacing with PG > has been having problems with run-away users that pound one DB with > intense processes running periodically in cron jobs. The effect is that > it takes up all the resources and the users of other DBs suffer because > the CPU is pegged servicing the first guy. As others have noticed, it may not just be CPU. I/O is usually the bottleneck in a DBMS. CPU usage may show up, but if it's shown as "IOWait" then it's actually time a process spends waiting for disk I/O, not "real" CPU time used. Whether the issue is with disk I/O or with CPU, PostgreSQL does not have anything much in the way of facilities to set priorities for users or queries. For CPU, as others have noticed, you can use "nice", but it's a bit clumsy. Personally I'd recommend installing a C-language function "nice_backend(prio)" that renices the calling backend to "prio". Limit "prio" to >= 0, and provide a second function "nice_backend_super(pid,prio)" that can renice any backend pid and set any priority, but is usable only by the superuser. I don't know of any canned implementations of these, but it shouldn't be hard to whip them up based on the C user defined function documentation and examples, using the nice(2) system call to do the dirty work. I/O is harder. Some operating systems offer I/O priorities for processes, like Linux's ionice, and you'd think you could use these in a similar way to how you use 'nice'. Unfortunately, that won't work well, because a lot of the work PostgreSQL does - especially disk writes - are done via a helper background writer process working from memory shared by all backends. Similarly, the write-ahead logs are managed by their own process via shared memory. So it's very hard to effectively give one user priority over another for writes. ionice should be somewhat effective for reads, though. Additionally, PostgreSQL doesn't have any notion of locking priority. A "higher priority" transaction has no way to automatically terminate a lower priority transaction that's doing slow work while holding a required lock. You can manually terminate the problem process with pg_cancel_backend(...) after identifying that it holds desired locks, but this is painful to say the least. This issue only comes up where the transactions are running in the same database, not just the same cluster, but it can be a pain. Since it sounds like your priorities are per-database not per-user or per-query, it probably doesn't matter to you, but I thought I'd mention it. > The PG versio will have one PG instance with many DBs. One set of > these DBs are vital for operations and should have the highest priority. The best bet here is to have multiple PostgreSQL clusters running on the machine - if you can't have separate hardware. Have one cluster (ie: a postmaster with its own listening port, data dir, write-ahead logs, etc) set to a higher CPU and I/O priority than the other when it starts up. Those priorities will be inherited by child backends, so connections to that cluster will have a higher priority on system resources. Do the same for the badly behaved users - put them in their own cluster, and renice them down for lower priority access to CPU and I/O resources. You could even be mean and (if running on Linux) put them in the "Idle" I/O priority, though that'd probably result in *awful* performance. The downside of separating DBs into multiple clusters grouped by priority is that each cluster runs on a different port, it must be backed up separately, all tuning and configuration must be done separately - including user management - etc. So there's more work involved. Unfortunately, you must also allocate separate blocks of shared memory to the clusters. Even when the high priority cluster is idle, the low priority one will not be able to use its shared memory for caching, and vice versa. So you may need more RAM. -- Craig Ringer
Gauthier, Dave wrote: > Actually, limiting resources on a per DB basis would work for me too. > > I thin kOracle had a thing where you could limit resources, or at least prioritize users. Pg isn't Oracle - for which we can be alternately extremely thankful and somewhat frustrated. Lots of shiny features vs hellish admin, freaky SQL syntax, and $myeyesarebleeding. > Someone else was reporting this to me, and again, it was a MySQL DB > that I'll eventually be replacing with PG. I'll inherit this problem > too, so I'm being proactive in looking for a fix. He said that the > cpu was pegged (no mention of I/O). He started killing processes left > and write until performace was back (it was either that or a DB > restart to clear the slate). Argh. It could just as easily be disk I/O. for i in `seq 1 20`; do ( dd if=/dev/md0 bs=1M seek=$(($i * 1000)) of=/dev/null &) done results in: top - 14:26:20 up 3 days, 1:43, 3 users, load average: 4.70, 3.02, 1.41 Tasks: 255 total, 15 running, 240 sleeping, 0 stopped, 0 zombie Cpu(s): 30.1%us, 65.0%sy, 0.0%ni, 0.0%id, 2.5%wa, 0.5%hi, 2.0%si, 0.0%st Mem: 4055728k total, 3776220k used, 279508k free, 750588k buffers Swap: 2120544k total, 4572k used, 2115972k free, 2245336k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1926 syslog 20 0 34552 1216 956 S 81 0.0 1:41.32 rsyslogd 32619 root 20 0 4152 1772 632 R 7 0.0 0:00.46 dd 32631 root 20 0 4152 1772 632 D 6 0.0 0:00.54 dd 32623 root 20 0 4152 1776 632 D 6 0.0 0:00.51 dd ... which could be confused for a busy CPU, but is really load caused by disk I/O. Most of the `dd' processes are in 'D' state - ie uninterruptable sleep in a system call - and if you check "wchan" with "ps" you'll see that they're sleeping waiting for disk I/O. Randomly killing processes is *not* a good way to tackle this - as I think you already knew. Tools like `ps' (especially using custom formats with wchan), `vmstat', `top', `iotop', `blktrace' + `blkparse', `btrace', `top', `sar', alt-sysrq-t, etc can help you learn more about what is going on without having to employ such brute-force tactics. You might find that the worst performance issues are caused by terribly written queries, poorly structured tables or views, lack of basic DB maintenance, bad DB tuning, etc. It need not be the users' eagerness alone - and you may find that a properly set up database can take the reporting load and the critical load effortlessly, or close enough as not to bother with extreme measures like separating the DBs into multiple clusters. -- Craig Ringer
Craig Ringer wrote: > I've also added a general explanation of the issues with prioritizing > users/queries/connections/databases: > > http://wiki.postgresql.org/wiki/Priorities > This is a very common question and I'm glad to have somewhere we just point people to when it comes up again. I just spent some time reviewing/improving your article, and I pulled the disclaimer off when I was done. It's a solid intro to this topic, and I linked it into the wiki infrastructure a bit better too (i.e. adding it to http://wiki.postgresql.org/wiki/Frequently_Asked_Questions and cleaning up where it shows up on the main Performance Optimization page). The main thing your discussion missed, and that some more detail could be provided on beyond what I just added, is how to answer the regular requests we see for "how can I lower the priority of this big batch job that's monopolizing the system?". People don't necessarily want to do that from the backend itself, which is what your comments focused on. It can be enough to find the process via the script that launched the batch job, and then having it issue the nice call. I added some brief comments about how you can look at pg_stat_activity to find the actual backend pid of something from outside of the client itself, to start documenting that process. It would be nice (ouch) to provide a better example of how to do that at some point. Sometimes for example I'll save the pid of the spawned psql process and use that to lookup the backend pid assigned; not hard to do if you've seen an example or know how this all fits together, but not really an obvious technique either. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On 1/12/2009 12:30 AM, Greg Smith wrote: > Craig Ringer wrote: >> I've also added a general explanation of the issues with prioritizing >> users/queries/connections/databases: >> >> http://wiki.postgresql.org/wiki/Priorities > I just spent some time > reviewing/improving your article, and I pulled the disclaimer off when I > was done. Thanks. I really appreciate your taking the time. The article is clearly improved. Looking at the diffs, it's clear I need to cut the fluff from my writing, as your edits make the whole thing a lot clearer. Keeping it succinct is important, and I tend to forget that. > I added some brief comments > about how you can look at pg_stat_activity to find the actual backend > pid of something from outside of the client itself, to start documenting > that process. It would be nice (ouch) to provide a better example of how > to do that at some point. Sometimes for example I'll save the pid of the > spawned psql process and use that to lookup the backend pid assigned; > not hard to do if you've seen an example or know how this all fits > together, but not really an obvious technique either. Good point. I assume you look up the associated backend by looking up the source IP and port of the client with `netstat', `lsof', etc, and matching that to pg_stat_activity? eg: $ lsof -F 'n' -P -i -n -a -p 29951 p29951 n127.0.0.1:39996->127.0.0.1:5432 ... from which you can query pg_stat_activity to get the backend pid: $ psql -c "select procpid from pg_stat_activity \ where client_addr = '127.0.0.1' AND client_port = '39996';" procpid --------- 29957 (1 row) I'm sure there must be a nicer way to get a list of the local ip and port of all a process's connected sockets without having to rely on lsof, though, surely? It makes me wonder if it'd be handy to have a command-line option for psql that caused it to spit the backend pid out on stderr. -- Craig Ringer
Craig Ringer wrote: > I assume you look up the associated backend by looking up the source > IP and port of the client with `netstat', `lsof', etc, and matching > that to pg_stat_activity? There's a bunch of ways I've seen this done: 1) If you spawn the psql process with bash using "&", you can then find its pid with "$!", then chain through the process tree with ps and pg_stat_activity as needed to figure out the backend pid. 2) If you know the query being run and it's unique (often the case with batch jobs run daily for example), you can search for it directly in the query text of pg_stat_activity. 3) Sometimes the only queries you want to re-nice are local, while everything else is remote. You might filter down possible pids that way. 4) Massage data from netstat, lsof, or similar tools to figure out which process you want. > It makes me wonder if it'd be handy to have a command-line option for > psql that caused it to spit the backend pid out on stderr. Inspired by this idea, I just thought of yet another approach. Put this at the beginning of something you want to track: COPY (SELECT pg_backend_pid()) TO '/place/to/save/pid'; Not so useful if there's more than one of the query running at once, but in the "nice a batch job" context it might be usable. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On 1/12/2009 11:33 AM, Greg Smith wrote: > Craig Ringer wrote: >> I assume you look up the associated backend by looking up the source >> IP and port of the client with `netstat', `lsof', etc, and matching >> that to pg_stat_activity? > There's a bunch of ways I've seen this done: > > 1) If you spawn the psql process with bash using "&", you can then find > its pid with "$!", then chain through the process tree with ps and > pg_stat_activity as needed to figure out the backend pid. I feel like I'm missing something obvious here. How can walking the process tree starting with the psql pid help you find the backend pid? The backend and client have no relationship in the process tree. At some point you have to match the (ip,port) tuple for the client's connection against pg_stat_activity - what you've listed separately as (4). Even that won't help if a unix socket is in use, since client_addr is null and client_port is -1. So: knowing the client (say, psql) pid, how can you find the backend pid without relying on something like lsof or netstat to identify the source (ip,port) combo used by the particular client instance whose pid you know? > 4) Massage data from netstat, lsof, or similar tools to figure out > which process you want. IOW, how is this distinct from (1) ? >> It makes me wonder if it'd be handy to have a command-line option for >> psql that caused it to spit the backend pid out on stderr. > Inspired by this idea, I just thought of yet another approach. Put this > at the beginning of something you want to track: > > COPY (SELECT pg_backend_pid()) TO '/place/to/save/pid'; Good point. It's a pity the filename arg to copy can't be an expression, since then it'd be easy to write to a unique file name, but it's obvious why iy can't. A PL/PgSQL EXECUTE statement gets around that, though. That way you can write a file with the backend pid as its file name into a directory watched by a helper process. When a file appears in the watched dir the helper renices (and ionices if appropriate) the process with the same pid as the file name, then deletes the file. You could even include the desired nice level, or other params, in the file contents that are generated by the query passed to COPY. I think that's worth setting up here, actually - wrap the COPY up in a 'security definer' PL/PgSQL function named, say, "nice_self(...)" that restricts the permitted nice level and I/O class, and expose it to batch jobs. I'll have to play with that. -- Craig Ringer
On 1 Dec 2009, at 4:55, Craig Ringer wrote: > On 1/12/2009 11:33 AM, Greg Smith wrote: >> Craig Ringer wrote: >>> I assume you look up the associated backend by looking up the source >>> IP and port of the client with `netstat', `lsof', etc, and matching >>> that to pg_stat_activity? >> There's a bunch of ways I've seen this done: >> >> 1) If you spawn the psql process with bash using "&", you can then find >> its pid with "$!", then chain through the process tree with ps and >> pg_stat_activity as needed to figure out the backend pid. > > I feel like I'm missing something obvious here. How can walking the process tree starting with the psql pid help you findthe backend pid? > The backend and client have no relationship in the process tree. At some point you have to match the (ip,port) tuple forthe client's connection against pg_stat_activity - what you've listed separately as (4). Even that won't help if a unixsocket is in use, since client_addr is null and client_port is -1. > > So: knowing the client (say, psql) pid, how can you find the backend pid without relying on something like lsof or netstatto identify the source (ip,port) combo used by the particular client instance whose pid you know? I think Greg intended this to be used from the process that needs re-nicing. If you have a batch job that you always wantto run at a different nice-level you can get its pid that way and use that to re-nice the process. You can also match it up to procpid in pg_stat_activity to get other info about the backend, but I don't think there's anythingin there that you'd need to know at that point (you got the pid to re-nice already, after all). Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b14df4211731232810455!
I think I'm just going to have to run the priority DB on a different server (linux) to make sure it doesn't get abused. I can see no other way. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Alban Hertroys Sent: Tuesday, December 01, 2009 4:18 AM To: Craig Ringer Cc: Greg Smith; pgsql-general@postgresql.org Subject: Re: [GENERAL] limiting resources to users On 1 Dec 2009, at 4:55, Craig Ringer wrote: > On 1/12/2009 11:33 AM, Greg Smith wrote: >> Craig Ringer wrote: >>> I assume you look up the associated backend by looking up the source >>> IP and port of the client with `netstat', `lsof', etc, and matching >>> that to pg_stat_activity? >> There's a bunch of ways I've seen this done: >> >> 1) If you spawn the psql process with bash using "&", you can then find >> its pid with "$!", then chain through the process tree with ps and >> pg_stat_activity as needed to figure out the backend pid. > > I feel like I'm missing something obvious here. How can walking the process tree starting with the psql pid help you findthe backend pid? > The backend and client have no relationship in the process tree. At some point you have to match the (ip,port) tuple forthe client's connection against pg_stat_activity - what you've listed separately as (4). Even that won't help if a unixsocket is in use, since client_addr is null and client_port is -1. > > So: knowing the client (say, psql) pid, how can you find the backend pid without relying on something like lsof or netstatto identify the source (ip,port) combo used by the particular client instance whose pid you know? I think Greg intended this to be used from the process that needs re-nicing. If you have a batch job that you always wantto run at a different nice-level you can get its pid that way and use that to re-nice the process. You can also match it up to procpid in pg_stat_activity to get other info about the backend, but I don't think there's anythingin there that you'd need to know at that point (you got the pid to re-nice already, after all). Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b14df4211731232810455! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Craig Ringer wrote: > On 1/12/2009 11:33 AM, Greg Smith wrote: >> 1) If you spawn the psql process with bash using "&", you can then find >> its pid with "$!", then chain through the process tree with ps and >> pg_stat_activity as needed to figure out the backend pid. > > I feel like I'm missing something obvious here. How can walking the > process tree starting with the psql pid help you find the backend pid? > The backend and client have no relationship in the process tree. At > some point you have to match the (ip,port) tuple for the client's > connection against pg_stat_activity - what you've listed separately as > (4). Even that won't help if a unix socket is in use, since > client_addr is null and client_port is -1. It turns out I completely misremembered what I did the last time, so you're right to call me on this. As I confirmed when I tried to do this myself again, the two port numbers are completely unrelated via anything you can see in ps. What I actually did was look at pg_stat_activity, note what processes were around, spawn the new one, and then look at pg_stat_activity again to figure out which it is. Not really a great approach, but a usable one for some situations. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
* Greg Smith (greg@2ndquadrant.com) wrote: > Craig Ringer wrote: >> On 1/12/2009 11:33 AM, Greg Smith wrote: >>> 1) If you spawn the psql process with bash using "&", you can then find >>> its pid with "$!", then chain through the process tree with ps and >>> pg_stat_activity as needed to figure out the backend pid. >> >> I feel like I'm missing something obvious here. How can walking the >> process tree starting with the psql pid help you find the backend pid? >> The backend and client have no relationship in the process tree. At >> some point you have to match the (ip,port) tuple for the client's >> connection against pg_stat_activity - what you've listed separately as >> (4). Even that won't help if a unix socket is in use, since >> client_addr is null and client_port is -1. > It turns out I completely misremembered what I did the last time, so > you're right to call me on this. As I confirmed when I tried to do this > myself again, the two port numbers are completely unrelated via anything > you can see in ps. What I actually did was look at pg_stat_activity, > note what processes were around, spawn the new one, and then look at > pg_stat_activity again to figure out which it is. Not really a great > approach, but a usable one for some situations. Perhaps we can stick the pid in client_addr? I havn't got a good solution at the moment for UNIX domain sockets, which really does suck. The best I've found, which is by no means a good solution, is: ===# netstat -np | grep -A1 .PGSQL | grep -B1 <pid of psql> (On a Linux system) I havn't actually been following this thread, so I'm not sure what the OP was after. For other sockets (IPv4, IPv6), IP+port can be used, of course. Thanks, Stephen