Thread: Idea about better configuration options for sort memory

Idea about better configuration options for sort memory

From
Tom Lane
Date:
We frequently recommend to people that they increase sort_mem while
creating btree indexes.  It is reasonable to have a larger setting
for that purpose, since (1) a single backend isn't going to be doing
multiple index creations in parallel (whereas complex queries could
easily be doing multiple sorts or hashes in parallel), and (2) in most
installations you won't have a large number of backends doing index
creations in parallel.  So while sort_mem has to be set on the
assumption that you might need quite a few times the nominal setting,
this isn't true for index creation.

It strikes me that we ought to revise the configuration options to
reflect this fact: index creation's memory limit should be driven by
a separate parameter instead of using sort_mem.

We already have a memory-usage parameter that is larger than sort_mem,
and for exactly the same reasons sketched above.  It's vacuum_mem.
VACUUM is also an operation that you don't expect to be running lots of
instances of in parallel, so it's okay for it to eat more than average
amounts of RAM.

So, what I'd like to do is make btree index creation pay attention to
vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
some more-generic name indicating that it's used for more than just
VACUUM.  Any objections so far?

Now, what should we call it instead?  I haven't come up with any
compelling thoughts --- the best I can do is "big_sort_mem" or
"single_sort_mem".  Surely someone out there has a better idea.

BTW, does anyone want to lobby for renaming sort_mem at the same time?
Since it's used for sizing hash tables as well as sort workspace, it's
rather misnamed.  I hesitate to rename it because of the potential for
confusion though.  People are pretty used to the existing name.
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
"Marc G. Fournier"
Date:
On Sat, 31 Jan 2004, Tom Lane wrote:

> So, what I'd like to do is make btree index creation pay attention to
> vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
> some more-generic name indicating that it's used for more than just
> VACUUM.  Any objections so far?

Why not create a seperate index_mem variable instead?  index creation
tends to be, I think, less frequent then vacuum, so having a higher value
for index_mem then vacuum_mem may make sense ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Sat, 31 Jan 2004, Tom Lane wrote:
>> So, what I'd like to do is make btree index creation pay attention to
>> vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
>> some more-generic name indicating that it's used for more than just
>> VACUUM.  Any objections so far?

> Why not create a seperate index_mem variable instead?  index creation
> tends to be, I think, less frequent then vacuum, so having a higher value
> for index_mem then vacuum_mem may make sense ...

Well, maybe.  What's in the back of my mind is that we may come across
other cases besides CREATE INDEX and VACUUM that should use a "one-off"
setting.  I think it'd make more sense to have one parameter than keep
on inventing new ones.  For comparison, SortMem is used for quite a few
different purposes, but I can't recall anyone needing to tweak an
individual one of those purposes other than CREATE INDEX.
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
Joe Conway
Date:
Tom Lane wrote:
> Now, what should we call it instead?  I haven't come up with any
> compelling thoughts --- the best I can do is "big_sort_mem" or
> "single_sort_mem".  Surely someone out there has a better idea.
> 
> BTW, does anyone want to lobby for renaming sort_mem at the same time?
> Since it's used for sizing hash tables as well as sort workspace, it's
> rather misnamed.  I hesitate to rename it because of the potential for
> confusion though.  People are pretty used to the existing name.

Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
similar?

Joe




Re: Idea about better configuration options for sort memory

From
"Marc G. Fournier"
Date:
On Sat, 31 Jan 2004, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@postgresql.org> writes:
> > On Sat, 31 Jan 2004, Tom Lane wrote:
> >> So, what I'd like to do is make btree index creation pay attention to
> >> vacuum_mem instead of sort_mem, and rename the vacuum_mem parameter to
> >> some more-generic name indicating that it's used for more than just
> >> VACUUM.  Any objections so far?
>
> > Why not create a seperate index_mem variable instead?  index creation
> > tends to be, I think, less frequent then vacuum, so having a higher value
> > for index_mem then vacuum_mem may make sense ...
>
> Well, maybe.  What's in the back of my mind is that we may come across
> other cases besides CREATE INDEX and VACUUM that should use a "one-off"
> setting.  I think it'd make more sense to have one parameter than keep
> on inventing new ones.  For comparison, SortMem is used for quite a few
> different purposes, but I can't recall anyone needing to tweak an
> individual one of those purposes other than CREATE INDEX.

Why not a 'default_mem' parameter that auto-sets the others if not
explicitly set?  note that, at least in my case, I didn't know that
sort_mem affected CREATE INDEX, only ORDER/GORUP BYs ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> BTW, does anyone want to lobby for renaming sort_mem at the same time?
>> Since it's used for sizing hash tables as well as sort workspace, it's
>> rather misnamed.  I hesitate to rename it because of the potential for
>> confusion though.  People are pretty used to the existing name.

> Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> similar?

I'll go with these unless someone has another proposal ...
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
Bruce Momjian
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> > Tom Lane wrote:
> >> BTW, does anyone want to lobby for renaming sort_mem at the same time?
> >> Since it's used for sizing hash tables as well as sort workspace, it's
> >> rather misnamed.  I hesitate to rename it because of the potential for
> >> confusion though.  People are pretty used to the existing name.
> 
> > Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> > similar?
> 
> I'll go with these unless someone has another proposal ...

The only confusion is that you can use multiple query_work_mem per
query, but I can't think of a better name.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Joe Conway <mail@joeconway.com> writes:
>>> Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
>>> similar?
>> 
>> I'll go with these unless someone has another proposal ...

> The only confusion is that you can use multiple query_work_mem per
> query, but I can't think of a better name.

True.  Maybe just "work_mem" and "maintenance_work_mem"?

BTW, I am going to look at whether GUC can be persuaded to continue to
allow "sort_mem" as an alternate name, if we rename it.  That would
alleviate most of the backward-compatibility issues of changing such
a well-known parameter name.
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Joe Conway <mail@joeconway.com> writes:
> >>> Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> >>> similar?
> >> 
> >> I'll go with these unless someone has another proposal ...
> 
> > The only confusion is that you can use multiple query_work_mem per
> > query, but I can't think of a better name.
> 
> True.  Maybe just "work_mem" and "maintenance_work_mem"?
> 
> BTW, I am going to look at whether GUC can be persuaded to continue to
> allow "sort_mem" as an alternate name, if we rename it.  That would
> alleviate most of the backward-compatibility issues of changing such
> a well-known parameter name.

Good. It is not like we have a huge namespace limitation in there.  I
wonder if we could cost it as a list of string pointers, null
terminated.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> BTW, I am going to look at whether GUC can be persuaded to continue to
>> allow "sort_mem" as an alternate name, if we rename it.  That would
>> alleviate most of the backward-compatibility issues of changing such
>> a well-known parameter name.

> Good. It is not like we have a huge namespace limitation in there.  I
> wonder if we could cost it as a list of string pointers, null
> terminated.

After looking at the code a bit, I think the simplest solution is for
find_option to look in a separate mapping table (mapping from old to new
option name) if it doesn't find the given name in the main table.  This
would make lookup of "old" names a shade slower than "preferred" names,
but that doesn't seem like a problem.

With this approach, old GUC names would be recognized in SHOW and SET
commands, as well as the other ways you can set a variable by name
(postgresql.conf, ALTER USER SET, etc).  But only the new names would
appear in SHOW ALL or the pg_settings view.  Does that seem OK?
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
Gaetano Mendola
Date:
Tom Lane wrote:


> With this approach, old GUC names would be recognized in SHOW and SET
> commands, as well as the other ways you can set a variable by name
> (postgresql.conf, ALTER USER SET, etc).  But only the new names would
> appear in SHOW ALL or the pg_settings view.  Does that seem OK?

May be raise a warning if the old GUC is used, can speed the
adoption of new GUCs variables.

Regards
Gaetano Mendola


Re: Idea about better configuration options for sort memory

From
Joe Conway
Date:
Tom Lane wrote:
> With this approach, old GUC names would be recognized in SHOW and SET
> commands, as well as the other ways you can set a variable by name
> (postgresql.conf, ALTER USER SET, etc).  But only the new names would
> appear in SHOW ALL or the pg_settings view.  Does that seem OK?
> 

Seems OK to me, in fact maybe preferred. But I wonder if we should emit 
a NOTICE when old names are used with SHOW and SET commands?

Joe



Re: Idea about better configuration options for sort memory

From
Christopher Kings-Lynne
Date:
> Seems OK to me, in fact maybe preferred. But I wonder if we should emit 
> a NOTICE when old names are used with SHOW and SET commands?

A WARNING should be issued.

Chris



Re: Idea about better configuration options for sort memory

From
Fabien COELHO
Date:
On Sat, 31 Jan 2004, Tom Lane wrote:

> Now, what should we call it instead?  I haven't come up with any
> compelling thoughts --- the best I can do is "big_sort_mem" or
> "single_sort_mem".  Surely someone out there has a better idea.

vacuuming and indexing are not too frequent database administration tasks.

administration_mem ? admin_mem ? system_mem ? system_task_mem ?
systask_mem ? executive_mem ?

Anything may be chosen, but the point is to have a good comment within the
template configuration file.

I would nevertheless avoid "big".
What is big changes overtime with computers;-)

-- 
Fabien.


Re: Idea about better configuration options for sort

From
Jeff
Date:
> On Sat, 31 Jan 2004, Tom Lane wrote:
> 
> > "Marc G. Fournier" <scrappy@postgresql.org> writes:
> > > On Sat, 31 Jan 2004, Tom Lane wrote:
> > >> So, what I'd like to do is make btree index creation pay
> > >attention to> vacuum_mem instead of sort_mem, and rename the
> > >vacuum_mem parameter to> some more-generic name indicating that
> > >it's used for more than just> VACUUM.  Any objections so far?
> >
> > > Why not create a seperate index_mem variable instead?  index
> > > creation tends to be, I think, less frequent then vacuum, so
> > > having a higher value for index_mem then vacuum_mem may make sense
> > > ...
> >
> > Well, maybe.  What's in the back of my mind is that we may come
> > across other cases besides CREATE INDEX and VACUUM that should use a
> > "one-off" setting.  I think it'd make more sense to have one
> > parameter than keep on inventing new ones.  For comparison, SortMem
> > is used for quite a few different purposes, but I can't recall
> > anyone needing to tweak an individual one of those purposes other
> > than CREATE INDEX.
> 

I don't know if this would apply here - but foriegn key creation also
benefits hugely from jacking up sort_mem and you also don't do too many
of those in parellel.   

I'm guessing it would be quite in-elegant and kludgy to make that code
use the bigger pool.. it would benefit restore times though.

-- 
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/


Re: Idea about better configuration options for sort memory

From
Bruce Momjian
Date:
Tom Lane wrote:
> After looking at the code a bit, I think the simplest solution is for
> find_option to look in a separate mapping table (mapping from old to new
> option name) if it doesn't find the given name in the main table.  This
> would make lookup of "old" names a shade slower than "preferred" names,
> but that doesn't seem like a problem.
> 
> With this approach, old GUC names would be recognized in SHOW and SET
> commands, as well as the other ways you can set a variable by name
> (postgresql.conf, ALTER USER SET, etc).  But only the new names would
> appear in SHOW ALL or the pg_settings view.  Does that seem OK?

Sounds good.  The idea that we can keep the names constant seems to be a
losing proposal.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
Jeff <threshar@torgo.978.org> writes:
>> On Sat, 31 Jan 2004, Tom Lane wrote:
> Well, maybe.  What's in the back of my mind is that we may come
> across other cases besides CREATE INDEX and VACUUM that should use a
> "one-off" setting.  I think it'd make more sense to have one
> parameter than keep on inventing new ones.

> I don't know if this would apply here - but foriegn key creation also
> benefits hugely from jacking up sort_mem and you also don't do too many
> of those in parellel.   

> I'm guessing it would be quite in-elegant and kludgy to make that code
> use the bigger pool.. it would benefit restore times though.

Actually, it wouldn't be all that hard.  We could make
RI_Initial_Check() do the equivalent of "SET LOCAL work_mem" before
issuing the query, and then again afterwards to restore the prior
value.  This would have no permanent effect on work_mem, because the
old value would be restored by transaction abort if the check query
fails.

This seems like a good idea to me, so I'll do it unless I hear
objections.
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
"scott.marlowe"
Date:
On Mon, 2 Feb 2004, Tom Lane wrote:

> Jeff <threshar@torgo.978.org> writes:
> >> On Sat, 31 Jan 2004, Tom Lane wrote:
> > Well, maybe.  What's in the back of my mind is that we may come
> > across other cases besides CREATE INDEX and VACUUM that should use a
> > "one-off" setting.  I think it'd make more sense to have one
> > parameter than keep on inventing new ones.
> 
> > I don't know if this would apply here - but foriegn key creation also
> > benefits hugely from jacking up sort_mem and you also don't do too many
> > of those in parellel.   
> 
> > I'm guessing it would be quite in-elegant and kludgy to make that code
> > use the bigger pool.. it would benefit restore times though.
> 
> Actually, it wouldn't be all that hard.  We could make
> RI_Initial_Check() do the equivalent of "SET LOCAL work_mem" before
> issuing the query, and then again afterwards to restore the prior
> value.  This would have no permanent effect on work_mem, because the
> old value would be restored by transaction abort if the check query
> fails.
> 
> This seems like a good idea to me, so I'll do it unless I hear
> objections.

any chance of having some kind of max_total_sort_mem setting to keep 
machines out of swap storms, or would that be a nightmare to implement?



Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> any chance of having some kind of max_total_sort_mem setting to keep 
> machines out of swap storms, or would that be a nightmare to implement?

I don't see any reasonable way to do that.
        regards, tom lane


Re: Idea about better configuration options for sort memory

From
"scott.marlowe"
Date:
On Mon, 2 Feb 2004, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > any chance of having some kind of max_total_sort_mem setting to keep 
> > machines out of swap storms, or would that be a nightmare to implement?
> 
> I don't see any reasonable way to do that.

I didn't think there was.  just hoping... :-)



Re: Idea about better configuration options for sort memory

From
Bruce Momjian
Date:
scott.marlowe wrote:
> On Mon, 2 Feb 2004, Tom Lane wrote:
> 
> > "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > any chance of having some kind of max_total_sort_mem setting to keep 
> > > machines out of swap storms, or would that be a nightmare to implement?
> > 
> > I don't see any reasonable way to do that.
> 
> I didn't think there was.  just hoping... :-)

Someone asked for this in Copenhagen, and I said we can't see how to do
it.  The only idea I had as to give the first requestor 50% of the
total, then a second query 50% of the remaining memory.  Is that better
than what we have?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I didn't think there was.  just hoping... :-)

> Someone asked for this in Copenhagen, and I said we can't see how to do
> it.  The only idea I had as to give the first requestor 50% of the
> total, then a second query 50% of the remaining memory.  Is that better
> than what we have?

How would you do that --- who's the "first requestor"?  The delay
between planning and execution for prepared statements (including
plpgsql functions) seems to make it impossible to do anything useful in
terms of dynamic allocation of memory.

What would be more reasonable to try for is a per-query upper limit on
space consumption.  That at least avoids any concurrency issues and
reduces it to a pure planning problem.  However, I don't see any real
good way to do that either.  With the bottom-up planning process we use,
the cost of (say) a first-level sort must be assigned before we know
whether any additional sorts or hashes will be needed at upper levels.

I thought a little bit about assuming that one workspace would be needed
per input relation --- that is, if there are N relations in the query
then set SortMem to TotalQueryMem/N.  But this would severely penalize
plans that need fewer workspaces than that.

Another tack is to let the planner assume SortMem per workspace but at
executor start (where we could know the number of plan nodes that
actually need workspaces) set the effective SortMem to TotalQueryMem/N.
The trouble with this is you could end up with a severely nonoptimal
plan, eg a sort or hash being done in much too little space.
        regards, tom lane


Re: Idea about better configuration options for sort

From
Rod Taylor
Date:
> > > "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > > any chance of having some kind of max_total_sort_mem setting to keep
> > > > machines out of swap storms, or would that be a nightmare to implement?

> Someone asked for this in Copenhagen, and I said we can't see how to do
> it.  The only idea I had as to give the first requestor 50% of the
> total, then a second query 50% of the remaining memory.  Is that better
> than what we have?

Lets look at it from another direction. The goal isn't to set a maximum
memory amount, but to avoid swapping.

Add a toggle to PostgreSQL that says (essentially) "I am the only
resource intensive program running".

If this was done, could we not work closer with the kernel? Ask the
kernel how much Free + Buffer memory there is, knock it down by 75% and
use that for our sort memory value (total sort memory for individual
backend -- not operation).

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Idea about better configuration options for sort

From
"Simon Riggs"
Date:
>Rod Taylor writes
> > > > "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > > > any chance of having some kind of max_total_sort_mem setting
to
> keep
> > > > > machines out of swap storms, or would that be a nightmare to
> implement?
> 
> > Someone asked for this in Copenhagen, and I said we can't see how to
do
> > it.  The only idea I had as to give the first requestor 50% of the
> > total, then a second query 50% of the remaining memory.  Is that
better
> > than what we have?
> 
> Lets look at it from another direction. The goal isn't to set a
maximum
> memory amount, but to avoid swapping.

I very much like your high level thinking, though on balance, I
personally do want to control the maximum memory allocation. It seems to
me that in general, there are just too many possibilities for what you
might want to mix on the same system. Perhaps we should restate the goal
slightly as being "maximising performance, whilst minimizing the RISK of
swapping".
An alternate suggestion might be a max_instance_mem setting, from which
all other memory allocations by that postgresql server were derived.
That way, however the "black box" operates, you have a single,
well-defined control point that will allow you to be as generous as you
see fit, but no further. [There's probably a few views on the
instance/database etc thing... I'm happy with more than one control
point - the name is less relevant] You can always write a script to
calculate the setting of this as a percentage of physical memory if you
want to do this automatically.

The suggestion about using percentages as relative rather than absolute
memory allocation has definitely been used successfully in the past on
other software systems. ...not the half-again each time method, but
assigning memory as a percentage of whatever's allocated. That way you
can raise the limit without changing everything else.

Best regards, Simon Riggs



Re: Idea about better configuration options for sort memory

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> > similar?
> 
> I'll go with these unless someone has another proposal ...

dml_sort_mem and ddl_sort_mem ?

-- 
greg



Re: Idea about better configuration options for sort memory

From
"scott.marlowe"
Date:
On 12 Feb 2004, Greg Stark wrote:

> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > > Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> > > similar?
> > 
> > I'll go with these unless someone has another proposal ...
> 
> dml_sort_mem and ddl_sort_mem ?

I like those.  Are they an accurte representation of what's going on?  If 
so, I'd go with these, as they are more easily recognizable by folks 
who've worked with dbs for a while.  On the other hand, they're probably 
less recognizable to the newbies.



Re: Idea about better configuration options for sort memory

From
Bruce Momjian
Date:
Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > > Hmmm ... maybe query_work_mem and maintenance_work_mem, or something 
> > > similar?
> > 
> > I'll go with these unless someone has another proposal ...
> 
> dml_sort_mem and ddl_sort_mem ?

I thought about that, but didn't think DML/DDL was recognized by most
admins.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Idea about better configuration options for sort memory

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On 12 Feb 2004, Greg Stark wrote:
>> dml_sort_mem and ddl_sort_mem ?

> I like those.  Are they an accurte representation of what's going on?

No, not particularly ...
        regards, tom lane