Thread: changing sort_mem on the fly?

changing sort_mem on the fly?

From
Lonni J Friedman
Date:
Is there any way to increase sort_mem without having to perform a full
restart of the DB?  Will just doing a 'kill -HUP $PID" work as long as
i've made the change in postgresql.conf first?

As a side question, are the options that will take effect with a kill
-HUP documented somewhere?

thanks!

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:
> Is there any way to increase sort_mem without having to perform a full
> restart of the DB?  Will just doing a 'kill -HUP $PID" work as long as
> i've made the change in postgresql.conf first?
>
> As a side question, are the options that will take effect with a kill
> -HUP documented somewhere?

If you just want to change it for one connection, you can do 'set
sort_mem=newvalue;'. You might need to be an admin to do it, though.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Michael Fuhr
Date:
On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote:
>
> If you just want to change it for one connection, you can do 'set
> sort_mem=newvalue;'. You might need to be an admin to do it, though.

Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and
earlier.  Does setting that variable per session actually have an
effect, and if it does, does allowing ordinary users to set it
present a risk?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: changing sort_mem on the fly?

From
"Jonel Rienton"
Date:
I usually use pg_ctl reload instead of kill

On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote
> Is there any way to increase sort_mem without having to perform a
> full restart of the DB?  Will just doing a 'kill -HUP $PID" work as
> long as i've made the change in postgresql.conf first?
>
> As a side question, are the options that will take effect with a kill
> -HUP documented somewhere?
>
> thanks!
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> L. Friedman                                    netllama@gmail.com
> LlamaLand                       http://netllama.linux-sxs.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


--
Jonel Rienton
http://blogs.road14.com
Software Developer, *nix Advocate


Re: changing sort_mem on the fly?

From
Lonni J Friedman
Date:
OK, but does that actually work to change sort_mem for all future connections?


On Thu, 27 Jan 2005 19:25:51 -0600, Jonel Rienton <jonel@road14.com> wrote:
> I usually use pg_ctl reload instead of kill
>
> On Thu, 27 Jan 2005 14:57:22 -0800, Lonni J Friedman wrote
> > Is there any way to increase sort_mem without having to perform a
> > full restart of the DB?  Will just doing a 'kill -HUP $PID" work as
> > long as i've made the change in postgresql.conf first?
> >
> > As a side question, are the options that will take effect with a kill
> > -HUP documented somewhere?



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: changing sort_mem on the fly?

From
Lonni J Friedman
Date:
On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:
> On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:
> > Is there any way to increase sort_mem without having to perform a full
> > restart of the DB?  Will just doing a 'kill -HUP $PID" work as long as
> > i've made the change in postgresql.conf first?
> >
> > As a side question, are the options that will take effect with a kill
> > -HUP documented somewhere?
>
> If you just want to change it for one connection, you can do 'set
> sort_mem=newvalue;'. You might need to be an admin to do it, though.

I need to do it for all future connections, not just one.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Thu, Jan 27, 2005 at 06:23:05PM -0700, Michael Fuhr wrote:
> On Thu, Jan 27, 2005 at 06:23:21PM -0600, Jim C. Nasby wrote:
> >
> > If you just want to change it for one connection, you can do 'set
> > sort_mem=newvalue;'. You might need to be an admin to do it, though.
>
> Ordinary users can set sort_mem/work_mem, at least in 8.0.0 and
> earlier.  Does setting that variable per session actually have an
> effect, and if it does, does allowing ordinary users to set it
> present a risk?

I ass-u-me that the setting actually takes effect. You could test it by
setting it high enough to send a system into swapping and run a query
that will use enough sort_mem. If it starts swapping, it worked. :)

Yes, there is a risk allowing ad-hoc settings; you can starve the
machine for memory.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote:
> On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:
> > On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:
> > > Is there any way to increase sort_mem without having to perform a full
> > > restart of the DB?  Will just doing a 'kill -HUP $PID" work as long as
> > > i've made the change in postgresql.conf first?
> > >
> > > As a side question, are the options that will take effect with a kill
> > > -HUP documented somewhere?
> >
> > If you just want to change it for one connection, you can do 'set
> > sort_mem=newvalue;'. You might need to be an admin to do it, though.
>
> I need to do it for all future connections, not just one.

'm guessing pg_ctl reload would take effect on all new connections. Try
it, and see what psql -c 'show sort_mem' says.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Lonni J Friedman
Date:
On Thu, 27 Jan 2005 20:05:19 -0600, Jim C. Nasby <decibel@decibel.org> wrote:
> On Thu, Jan 27, 2005 at 05:52:41PM -0800, Lonni J Friedman wrote:
> > On Thu, 27 Jan 2005 18:23:21 -0600, Jim C. Nasby <decibel@decibel.org> wrote:
> > > On Thu, Jan 27, 2005 at 02:57:22PM -0800, Lonni J Friedman wrote:
> > > > Is there any way to increase sort_mem without having to perform a full
> > > > restart of the DB?  Will just doing a 'kill -HUP $PID" work as long as
> > > > i've made the change in postgresql.conf first?
> > > >
> > > > As a side question, are the options that will take effect with a kill
> > > > -HUP documented somewhere?
> > >
> > > If you just want to change it for one connection, you can do 'set
> > > sort_mem=newvalue;'. You might need to be an admin to do it, though.
> >
> > I need to do it for all future connections, not just one.
>
> 'm guessing pg_ctl reload would take effect on all new connections. Try
> it, and see what psql -c 'show sort_mem' says.

excellent, that worked!  thanks!


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: changing sort_mem on the fly?

From
Neil Conway
Date:
On Thu, 2005-01-27 at 20:03 -0600, Jim C. Nasby wrote:
> Yes, there is a risk allowing ad-hoc settings; you can starve the
> machine for memory.

A malicious user who can execute SELECT queries can already consume an
arbitrary amount of memory -- say, by disabling GEQO and self-joining
pg_class to itself 50 times. I'm not sure that letting users modify
sort_mem/work_mem actually increases the risk from malicious users.
Restricting this parameter to superusers only would also be a hit to
usability.

-Neil



Re: changing sort_mem on the fly?

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> A malicious user who can execute SELECT queries can already consume an
> arbitrary amount of memory -- say, by disabling GEQO and self-joining
> pg_class to itself 50 times. I'm not sure that letting users modify
> sort_mem/work_mem actually increases the risk from malicious users.

The correct place for a sysadmin to limit memory usage would be in the
ulimit settings the postmaster starts under.  Of course, Neil's argument
still holds in general: anyone who can write arbitrary queries is not
going to have any difficulty in soaking up unreasonable amounts of
resources.  Trying to restrict that would probably make the system less
useful rather than more so.

            regards, tom lane

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Fri, Jan 28, 2005 at 02:43:13AM -0500, Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > A malicious user who can execute SELECT queries can already consume an
> > arbitrary amount of memory -- say, by disabling GEQO and self-joining
> > pg_class to itself 50 times. I'm not sure that letting users modify
> > sort_mem/work_mem actually increases the risk from malicious users.
>
> The correct place for a sysadmin to limit memory usage would be in the
> ulimit settings the postmaster starts under.  Of course, Neil's argument
> still holds in general: anyone who can write arbitrary queries is not
> going to have any difficulty in soaking up unreasonable amounts of
> resources.  Trying to restrict that would probably make the system less
> useful rather than more so.

Just to clarify, I wasn't suggesting limiting sort_mem to only admins, I
was only pointing out that it's possible for a user to starve the box of
memory with it. I'd also point out that IMHO it's far more likely that
someone could accidentally hose the box with sort_mem than with a
SELECT. In the year 2020 when there's a bunch of bored developers
sitting around wondering what to do, it might be nice to have the
ability to apply some security to GUC settings, possibly just being able
to restrict them to super-user, and having a SECURITY DEFINER function
users can call instead that could apply permissions.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Neil Conway
Date:
Tom Lane wrote:
> The correct place for a sysadmin to limit memory usage would be in the
> ulimit settings the postmaster starts under.  Of course, Neil's argument
> still holds in general: anyone who can write arbitrary queries is not
> going to have any difficulty in soaking up unreasonable amounts of
> resources.  Trying to restrict that would probably make the system less
> useful rather than more so.

I'm not sure if I agree that there's no potential for implementing
better resource limits/quotas in PG in the future, I was just pointing
out that it would require a lot more work to prevent resource
consumption by malicious users than merely limiting who is allowed to
set sort_mem/work_mem. If you could implement per-user/per-connection
limits on things like processor usage or disk space consumption, I think
that would be useful to some users (e.g. people offering PG in a web
hosting environment).

-Neil

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Sat, Jan 29, 2005 at 09:23:06PM +1100, Neil Conway wrote:
> Tom Lane wrote:
> >The correct place for a sysadmin to limit memory usage would be in the
> >ulimit settings the postmaster starts under.  Of course, Neil's argument
> >still holds in general: anyone who can write arbitrary queries is not
> >going to have any difficulty in soaking up unreasonable amounts of
> >resources.  Trying to restrict that would probably make the system less
> >useful rather than more so.
>
> I'm not sure if I agree that there's no potential for implementing
> better resource limits/quotas in PG in the future, I was just pointing
> out that it would require a lot more work to prevent resource
> consumption by malicious users than merely limiting who is allowed to
> set sort_mem/work_mem. If you could implement per-user/per-connection
> limits on things like processor usage or disk space consumption, I think
> that would be useful to some users (e.g. people offering PG in a web
> hosting environment).

Since you brought up the future... :)

I'd really like to see an improvement in how sort_mem/work_mem is
handled. It's currently impossible to set it in a way to support
moderately large sorts (say, 5% of available memory) without either
embedding 'set sort_mem = blah' in your code or running the risk that at
some point your database server will start swapping. I wish there was
some way to limit total sort size for the entire system, or at least per
connection/query.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Neil Conway
Date:
Jim C. Nasby wrote:
> I'd really like to see an improvement in how sort_mem/work_mem is
> handled.

So would I :) (I think it's well known that the current system is not
optimal.)

Do you have any thoughts on how to improve it?

-Neil

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:
> Jim C. Nasby wrote:
> >I'd really like to see an improvement in how sort_mem/work_mem is
> >handled.
>
> So would I :) (I think it's well known that the current system is not
> optimal.)
>
> Do you have any thoughts on how to improve it?

See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php

There could certainly be better ways than what I proposed, too. It might
be worth investigating how the 'big 3' handle it, though some of what
they do might be patented.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:
>> Do you have any thoughts on how to improve it?

> See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
> and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php

All of this falls down on twin problems: (1) we have no portable way to
find out how much RAM is really available, and (2) the planner has to
make a cost estimate for the sort in advance of actual runtime --- which
requires making some assumption about the value of sort_mem that will be
used.

            regards, tom lane

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Sun, Jan 30, 2005 at 01:05:15PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > On Sun, Jan 30, 2005 at 08:50:49PM +1100, Neil Conway wrote:
> >> Do you have any thoughts on how to improve it?
>
> > See http://archives.postgresql.org/pgsql-general/2003-06/msg01072.php
> > and http://archives.postgresql.org/pgsql-general/2003-06/msg01089.php
>
> All of this falls down on twin problems: (1) we have no portable way to
> find out how much RAM is really available, and (2) the planner has to
> make a cost estimate for the sort in advance of actual runtime --- which
> requires making some assumption about the value of sort_mem that will be
> used.

As I said, I'm sure there's better minds who can come up with better
ideas than I. :) I personally think this is important enough to warrant
a TODO so those minds can think about it at some point, but that's just
my opinion.

As for your two points: if you mean how much RAM is available for
sorting, I assumed there would be a GUC for that, although
effective_cache_size might be used.

I thought that the planner did estimate sort size when it called the
sort code, but it's been a long time since I looked at it and it was
somewhat over my head anyway. Since the planner knows how many rows will
be going into the sort and how wide they are, ISTM it should be able to
estimate how much memory will be needed. Even if it can't, I think you
could still get by with just keeping track of how much memory a sort is
using when it asks for more memory. At some point it would hit a
threshold and would then spill to disk.

BTW, if it would be useful, I could test sort speeds for sorts that
spill to disk with different sort_mem settings. That would help answer
the question of how much it would hurt for a sort that spills to disk to
give back some of it's memory so that other sort operations wouldn't
spill to disk.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: changing sort_mem on the fly?

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> Since the planner knows how many rows will
> be going into the sort and how wide they are, ISTM it should be able to
> estimate how much memory will be needed.

... which is different from how much will be available.  See cost_sort():

 * If the total volume of data to sort is less than work_mem, we will do
 * an in-memory sort, which requires no I/O and about t*log2(t) tuple
 * comparisons for t tuples.
 *
 * If the total volume exceeds work_mem, we switch to a tape-style merge
 * algorithm.  There will still be about t*log2(t) tuple comparisons in
 * total, but we will also need to write and read each tuple once per
 * merge pass.    We expect about ceil(log6(r)) merge passes where r is the
 * number of initial runs formed (log6 because tuplesort.c uses six-tape
 * merging).  Since the average initial run should be about twice work_mem,
 * we have
 *        disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
 *        cpu = comparison_cost * t * log2(t)

The actual cost of a sort is therefore *highly* sensitive to how much
memory it is allowed to use.  Blowing off any ability to estimate this
in advance is not going to improve matters...

            regards, tom lane

Re: changing sort_mem on the fly?

From
"Jim C. Nasby"
Date:
On Sun, Jan 30, 2005 at 04:49:39PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > Since the planner knows how many rows will
> > be going into the sort and how wide they are, ISTM it should be able to
> > estimate how much memory will be needed.
>
> ... which is different from how much will be available.  See cost_sort():

Ok, I wasn't sure which you were refering to. As I mentioned in the
earlier thread, there would have to be some means of accounting for how
much memory active sorts in the system are using. One possibility is
that a global counter is updated any time a sort allocates or frees
memory. If allocations are done 8k at a time that would probably be too
expensive, but I suspect it wouldn't be too bad with larger allocations
that are done less frequently. And bear in mind that if these changes
prevent even a few sorts an hour from spilling to disk then it's
probably still a net gain.

>  * If the total volume of data to sort is less than work_mem, we will do
>  * an in-memory sort, which requires no I/O and about t*log2(t) tuple
>  * comparisons for t tuples.
>  *
>  * If the total volume exceeds work_mem, we switch to a tape-style merge
>  * algorithm.  There will still be about t*log2(t) tuple comparisons in
>  * total, but we will also need to write and read each tuple once per
>  * merge pass.    We expect about ceil(log6(r)) merge passes where r is the
>  * number of initial runs formed (log6 because tuplesort.c uses six-tape
>  * merging).  Since the average initial run should be about twice work_mem,
>  * we have
>  *        disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
>  *        cpu = comparison_cost * t * log2(t)
>
> The actual cost of a sort is therefore *highly* sensitive to how much
> memory it is allowed to use.  Blowing off any ability to estimate this
> in advance is not going to improve matters...

Doesn't the sort code also have a provision for starting an in-memory
sort, and then realizing that it's going to exceed work_mem and going to
disk at that point?

Yes, without something as simple as work_mem it is harder to determine
how much memory you can use, though I don't think it's impossible.
First, a global variable/counter could be used to keep track of how much
memory all the active sorts estimate they'll need. That would be a
first-order guess. This could also be compared to how much memory is
actually being used, though I suspect that would require globally
tracking how much memory each sort estimates it will need and how much
it's already using. Of course there would also be some kind of policy
limiting sorts from taking over all of sort_mem; possibly a GUC,
possibly something more dynamic. Whatever it is, that gives you an
upper-bounds for how much memory would be used at a given point in time.

BTW, something that just occured to me... with the new cache management
code (ARC and whatever's going to replace it), presumably it will now be
better for PostgreSQL to primarily do caching instead of the OS. This
means you'd want to give PostgreSQL the lion's share of memory, but with
how sorts currently aquire memory doing so would be risky. Should there
be an option in 8.0 for sorts to use shared memory instead of allocating
their own from a (possibly small) pool of memory the OS has?
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"