Thread: changing sort_mem on the fly?
Is there any way to increase sort_mem without having to perform a full restart of the DB? Will just doing a 'kill -HUP $PID" work as long as i've made the change in postgresql.conf first? As a side question, are the options that will take effect with a kill -HUP documented somewhere? thanks! -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote: > Is there any way to increase sort_mem without having to perform a full > restart of the DB? Will just doing a 'kill -HUP $PID" work as long as > i've made the change in postgresql.conf first? > > As a side question, are the options that will take effect with a kill > -HUP documented somewhere? If you just want to change it for one connection, you can do 'set sort_mem=newvalue;'. You might need to be an admin to do it, though. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote: > > If you just want to change it for one connection, you can do 'set > sort_mem=newvalue;'. You might need to be an admin to do it, though. Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and earlier. Does setting that variable per session actually have an effect, and if it does, does allowing ordinary users to set it present a risk? -- Michael Fuhr http://www.fuhr.org/~mfuhr/
I usually use pg_ctl reload instead of kill On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote > Is there any way to increase sort_mem without having to perform a > full restart of the DB? Will just doing a 'kill -HUP $PID" work as > long as i've made the change in postgresql.conf first? > > As a side question, are the options that will take effect with a kill > -HUP documented somewhere? > > thanks! > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > L. Friedman netllama@gmail.com > LlamaLand http://netllama.linux-sxs.org > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Jonel Rienton http://blogs.road14.com Software Developer, *nix Advocate
OK, but does that actually work to change sort_mem for all future connections? On Thu, 27 Jan 2005 19:25:51 -0600, Jonel Rienton <jonel@road14.com> wrote: > I usually use pg_ctl reload instead of kill > > On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote > > Is there any way to increase sort_mem without having to perform a > > full restart of the DB? Will just doing a 'kill -HUP $PID" work as > > long as i've made the change in postgresql.conf first? > > > > As a side question, are the options that will take effect with a kill > > -HUP documented somewhere? -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote: > On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote: > > Is there any way to increase sort_mem without having to perform a full > > restart of the DB? Will just doing a 'kill -HUP $PID" work as long as > > i've made the change in postgresql.conf first? > > > > As a side question, are the options that will take effect with a kill > > -HUP documented somewhere? > > If you just want to change it for one connection, you can do 'set > sort_mem=newvalue;'. You might need to be an admin to do it, though. I need to do it for all future connections, not just one. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Thu, Jan 27, 2005 at 06:23:05PM -0700, Michael Fuhr wrote: > On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote: > > > > If you just want to change it for one connection, you can do 'set > > sort_mem=newvalue;'. You might need to be an admin to do it, though. > > Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and > earlier. Does setting that variable per session actually have an > effect, and if it does, does allowing ordinary users to set it > present a risk? I ass-u-me that the setting actually takes effect. You could test it by setting it high enough to send a system into swapping and run a query that will use enough sort_mem. If it starts swapping, it worked. :) Yes, there is a risk allowing ad-hoc settings; you can starve the machine for memory. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote: > On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote: > > On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote: > > > Is there any way to increase sort_mem without having to perform a full > > > restart of the DB? Will just doing a 'kill -HUP $PID" work as long as > > > i've made the change in postgresql.conf first? > > > > > > As a side question, are the options that will take effect with a kill > > > -HUP documented somewhere? > > > > If you just want to change it for one connection, you can do 'set > > sort_mem=newvalue;'. You might need to be an admin to do it, though. > > I need to do it for all future connections, not just one. 'm guessing pg_ctl reload would take effect on all new connections. Try it, and see what psql -c 'show sort_mem' says. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, 27 Jan 2005 20:05:19 -0600, Jim C. Nasby <decibel@decibel.org> wrote: > On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote: > > On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote: > > > On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote: > > > > Is there any way to increase sort_mem without having to perform a full > > > > restart of the DB? Will just doing a 'kill -HUP $PID" work as long as > > > > i've made the change in postgresql.conf first? > > > > > > > > As a side question, are the options that will take effect with a kill > > > > -HUP documented somewhere? > > > > > > If you just want to change it for one connection, you can do 'set > > > sort_mem=newvalue;'. You might need to be an admin to do it, though. > > > > I need to do it for all future connections, not just one. > > 'm guessing pg_ctl reload would take effect on all new connections. Try > it, and see what psql -c 'show sort_mem' says. excellent, that worked! thanks! -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman netllama@gmail.com LlamaLand http://netllama.linux-sxs.org
On Thu, 2005-01-27 at 20:03 -0600, Jim C. Nasby wrote: > Yes, there is a risk allowing ad-hoc settings; you can starve the > machine for memory. A malicious user who can execute SELECT queries can already consume an arbitrary amount of memory -- say, by disabling GEQO and self-joining pg_class to itself 50 times. I'm not sure that letting users modify sort_mem/work_mem actually increases the risk from malicious users. Restricting this parameter to superusers only would also be a hit to usability. -Neil
Neil Conway <neilc@samurai.com> writes: > A malicious user who can execute SELECT queries can already consume an > arbitrary amount of memory -- say, by disabling GEQO and self-joining > pg_class to itself 50 times. I'm not sure that letting users modify > sort_mem/work_mem actually increases the risk from malicious users. The correct place for a sysadmin to limit memory usage would be in the ulimit settings the postmaster starts under. Of course, Neil's argument still holds in general: anyone who can write arbitrary queries is not going to have any difficulty in soaking up unreasonable amounts of resources. Trying to restrict that would probably make the system less useful rather than more so. regards, tom lane
On Fri, Jan 28, 2005 at 02:43:13AM -0500, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > A malicious user who can execute SELECT queries can already consume an > > arbitrary amount of memory -- say, by disabling GEQO and self-joining > > pg_class to itself 50 times. I'm not sure that letting users modify > > sort_mem/work_mem actually increases the risk from malicious users. > > The correct place for a sysadmin to limit memory usage would be in the > ulimit settings the postmaster starts under. Of course, Neil's argument > still holds in general: anyone who can write arbitrary queries is not > going to have any difficulty in soaking up unreasonable amounts of > resources. Trying to restrict that would probably make the system less > useful rather than more so. Just to clarify, I wasn't suggesting limiting sort_mem to only admins, I was only pointing out that it's possible for a user to starve the box of memory with it. I'd also point out that IMHO it's far more likely that someone could accidentally hose the box with sort_mem than with a SELECT. In the year 2020 when there's a bunch of bored developers sitting around wondering what to do, it might be nice to have the ability to apply some security to GUC settings, possibly just being able to restrict them to super-user, and having a SECURITY DEFINER function users can call instead that could apply permissions. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Tom Lane wrote: > The correct place for a sysadmin to limit memory usage would be in the > ulimit settings the postmaster starts under. Of course, Neil's argument > still holds in general: anyone who can write arbitrary queries is not > going to have any difficulty in soaking up unreasonable amounts of > resources. Trying to restrict that would probably make the system less > useful rather than more so. I'm not sure if I agree that there's no potential for implementing better resource limits/quotas in PG in the future, I was just pointing out that it would require a lot more work to prevent resource consumption by malicious users than merely limiting who is allowed to set sort_mem/work_mem. If you could implement per-user/per-connection limits on things like processor usage or disk space consumption, I think that would be useful to some users (e.g. people offering PG in a web hosting environment). -Neil
On Sat, Jan 29, 2005 at 09:23:06PM +1100, Neil Conway wrote: > Tom Lane wrote: > >The correct place for a sysadmin to limit memory usage would be in the > >ulimit settings the postmaster starts under. Of course, Neil's argument > >still holds in general: anyone who can write arbitrary queries is not > >going to have any difficulty in soaking up unreasonable amounts of > >resources. Trying to restrict that would probably make the system less > >useful rather than more so. > > I'm not sure if I agree that there's no potential for implementing > better resource limits/quotas in PG in the future, I was just pointing > out that it would require a lot more work to prevent resource > consumption by malicious users than merely limiting who is allowed to > set sort_mem/work_mem. If you could implement per-user/per-connection > limits on things like processor usage or disk space consumption, I think > that would be useful to some users (e.g. people offering PG in a web > hosting environment). Since you brought up the future... :) I'd really like to see an improvement in how sort_mem/work_mem is handled. It's currently impossible to set it in a way to support moderately large sorts (say, 5% of available memory) without either embedding 'set sort_mem = blah' in your code or running the risk that at some point your database server will start swapping. I wish there was some way to limit total sort size for the entire system, or at least per connection/query. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Jim C. Nasby wrote: > I'd really like to see an improvement in how sort_mem/work_mem is > handled. So would I :) (I think it's well known that the current system is not optimal.) Do you have any thoughts on how to improve it? -Neil
On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote: > Jim C. Nasby wrote: > >I'd really like to see an improvement in how sort_mem/work_mem is > >handled. > > So would I :) (I think it's well known that the current system is not > optimal.) > > Do you have any thoughts on how to improve it? See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php There could certainly be better ways than what I proposed, too. It might be worth investigating how the 'big 3' handle it, though some of what they do might be patented. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote: >> Do you have any thoughts on how to improve it? > See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php > and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php All of this falls down on twin problems: (1) we have no portable way to find out how much RAM is really available, and (2) the planner has to make a cost estimate for the sort in advance of actual runtime --- which requires making some assumption about the value of sort_mem that will be used. regards, tom lane
On Sun, Jan 30, 2005 at 01:05:15PM -0500, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote: > >> Do you have any thoughts on how to improve it? > > > See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php > > and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php > > All of this falls down on twin problems: (1) we have no portable way to > find out how much RAM is really available, and (2) the planner has to > make a cost estimate for the sort in advance of actual runtime --- which > requires making some assumption about the value of sort_mem that will be > used. As I said, I'm sure there's better minds who can come up with better ideas than I. :) I personally think this is important enough to warrant a TODO so those minds can think about it at some point, but that's just my opinion. As for your two points: if you mean how much RAM is available for sorting, I assumed there would be a GUC for that, although effective_cache_size might be used. I thought that the planner did estimate sort size when it called the sort code, but it's been a long time since I looked at it and it was somewhat over my head anyway. Since the planner knows how many rows will be going into the sort and how wide they are, ISTM it should be able to estimate how much memory will be needed. Even if it can't, I think you could still get by with just keeping track of how much memory a sort is using when it asks for more memory. At some point it would hit a threshold and would then spill to disk. BTW, if it would be useful, I could test sort speeds for sorts that spill to disk with different sort_mem settings. That would help answer the question of how much it would hurt for a sort that spills to disk to give back some of it's memory so that other sort operations wouldn't spill to disk. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > Since the planner knows how many rows will > be going into the sort and how wide they are, ISTM it should be able to > estimate how much memory will be needed. ... which is different from how much will be available. See cost_sort(): * If the total volume of data to sort is less than work_mem, we will do * an in-memory sort, which requires no I/O and about t*log2(t) tuple * comparisons for t tuples. * * If the total volume exceeds work_mem, we switch to a tape-style merge * algorithm. There will still be about t*log2(t) tuple comparisons in * total, but we will also need to write and read each tuple once per * merge pass. We expect about ceil(log6(r)) merge passes where r is the * number of initial runs formed (log6 because tuplesort.c uses six-tape * merging). Since the average initial run should be about twice work_mem, * we have * disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem))) * cpu = comparison_cost * t * log2(t) The actual cost of a sort is therefore *highly* sensitive to how much memory it is allowed to use. Blowing off any ability to estimate this in advance is not going to improve matters... regards, tom lane
On Sun, Jan 30, 2005 at 04:49:39PM -0500, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > Since the planner knows how many rows will > > be going into the sort and how wide they are, ISTM it should be able to > > estimate how much memory will be needed. > > ... which is different from how much will be available. See cost_sort(): Ok, I wasn't sure which you were refering to. As I mentioned in the earlier thread, there would have to be some means of accounting for how much memory active sorts in the system are using. One possibility is that a global counter is updated any time a sort allocates or frees memory. If allocations are done 8k at a time that would probably be too expensive, but I suspect it wouldn't be too bad with larger allocations that are done less frequently. And bear in mind that if these changes prevent even a few sorts an hour from spilling to disk then it's probably still a net gain. > * If the total volume of data to sort is less than work_mem, we will do > * an in-memory sort, which requires no I/O and about t*log2(t) tuple > * comparisons for t tuples. > * > * If the total volume exceeds work_mem, we switch to a tape-style merge > * algorithm. There will still be about t*log2(t) tuple comparisons in > * total, but we will also need to write and read each tuple once per > * merge pass. We expect about ceil(log6(r)) merge passes where r is the > * number of initial runs formed (log6 because tuplesort.c uses six-tape > * merging). Since the average initial run should be about twice work_mem, > * we have > * disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem))) > * cpu = comparison_cost * t * log2(t) > > The actual cost of a sort is therefore *highly* sensitive to how much > memory it is allowed to use. Blowing off any ability to estimate this > in advance is not going to improve matters... Doesn't the sort code also have a provision for starting an in-memory sort, and then realizing that it's going to exceed work_mem and going to disk at that point? Yes, without something as simple as work_mem it is harder to determine how much memory you can use, though I don't think it's impossible. First, a global variable/counter could be used to keep track of how much memory all the active sorts estimate they'll need. That would be a first-order guess. This could also be compared to how much memory is actually being used, though I suspect that would require globally tracking how much memory each sort estimates it will need and how much it's already using. Of course there would also be some kind of policy limiting sorts from taking over all of sort_mem; possibly a GUC, possibly something more dynamic. Whatever it is, that gives you an upper-bounds for how much memory would be used at a given point in time. BTW, something that just occured to me... with the new cache management code (ARC and whatever's going to replace it), presumably it will now be better for PostgreSQL to primarily do caching instead of the OS. This means you'd want to give PostgreSQL the lion's share of memory, but with how sorts currently aquire memory doing so would be risky. Should there be an option in 8.0 for sorts to use shared memory instead of allocating their own from a (possibly small) pool of memory the OS has? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"