Thread: Allow sorts to use more available memory

Allow sorts to use more available memory

From
Robert Schnabel
Date:
The recent "data warehouse" thread made me think about how I use
work_mem for some of my big queries.  So I tried SET work_mem = '4GB'
for a session and got

ERROR: 4194304 is outside the valid range for parameter "work_mem" (64
.. 2097151)

A bit of searching turned up the "Allow sorts to use more available
memory" section of the to-do list.  Am I correct in reading that the
max_val is 2GB and regardless of how much RAM I have in the box I'm
stuck with only using 2GB?  Am I missing something?

I'm using: PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 64-bit
Windows 2008 Server Enterprise

Thanks,
Bob


Re: Allow sorts to use more available memory

From
Andy Colson
Date:
On 9/12/2011 12:33 PM, Robert Schnabel wrote:
> The recent "data warehouse" thread made me think about how I use
> work_mem for some of my big queries. So I tried SET work_mem = '4GB' for
> a session and got
>
> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64
> .. 2097151)
>
> A bit of searching turned up the "Allow sorts to use more available
> memory" section of the to-do list. Am I correct in reading that the
> max_val is 2GB and regardless of how much RAM I have in the box I'm
> stuck with only using 2GB? Am I missing something?
>
> I'm using: PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 64-bit
> Windows 2008 Server Enterprise
>
> Thanks,
> Bob
>
>
work_mem is not the total a query can use.  I believe each step can use
that much, and each backend can use it for multiple bits.  So if you had
two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = 8GB.

-Andy

Re: Allow sorts to use more available memory

From
Shaun Thomas
Date:
On 09/12/2011 12:47 PM, Andy Colson wrote:

> work_mem is not the total a query can use. I believe each step can
> use that much, and each backend can use it for multiple bits. So if
> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
> 8GB.

Exactly. Find a big query somewhere in your system. Use EXPLAIN to
examine it. Chances are, that one query has one or more sorts. Each one
of those gets its own work_mem. Each sort. The query have four sorts? It
may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
16MB on a test system. During a load test, the machine ran out of
memory, swapped out, and finally crashed after the OOM killer went nuts.

Set this value *at your own risk* and only after *significant* testing.
Having it too high can have rather unexpected consequences. Setting it
to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
very, very bad idea.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Allow sorts to use more available memory

From
Andy Colson
Date:
On 9/12/2011 12:57 PM, Shaun Thomas wrote:
> On 09/12/2011 12:47 PM, Andy Colson wrote:
>
>> work_mem is not the total a query can use. I believe each step can
>> use that much, and each backend can use it for multiple bits. So if
>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>> 8GB.
>
> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
> examine it.

Yeah, and even better, on PG 9, if you EXPLAIN ANALYZE it'll show you
just how much memory is actually being used.

-Andy

Re: Allow sorts to use more available memory

From
Robert Schnabel
Date:
On 9/12/2011 12:57 PM, Shaun Thomas wrote:
> On 09/12/2011 12:47 PM, Andy Colson wrote:
>
>> work_mem is not the total a query can use. I believe each step can
>> use that much, and each backend can use it for multiple bits. So if
>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>> 8GB.
> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
> examine it. Chances are, that one query has one or more sorts. Each one
> of those gets its own work_mem. Each sort. The query have four sorts? It
> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
> 16MB on a test system. During a load test, the machine ran out of
> memory, swapped out, and finally crashed after the OOM killer went nuts.
>
> Set this value *at your own risk* and only after *significant* testing.
> Having it too high can have rather unexpected consequences. Setting it
> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
> very, very bad idea.
>
Yep, I know.  But in the context of the data warehouse where *I'm the
only user* and I have a query that does, say 4 large sorts like
http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
about using 8GB or 16GB in the case of work_mem = 4GB.  I realize the
query above only used 1.9GB for the largest sort but I know I have other
queries with 1 or 2 sorts that I've watched go to disk.

Bob




Re: Allow sorts to use more available memory

From
Andy Colson
Date:
On 9/12/2011 1:22 PM, Robert Schnabel wrote:
>
> On 9/12/2011 12:57 PM, Shaun Thomas wrote:
>> On 09/12/2011 12:47 PM, Andy Colson wrote:
>>
>>> work_mem is not the total a query can use. I believe each step can
>>> use that much, and each backend can use it for multiple bits. So if
>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>>> 8GB.
>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
>> examine it. Chances are, that one query has one or more sorts. Each one
>> of those gets its own work_mem. Each sort. The query have four sorts? It
>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
>> 16MB on a test system. During a load test, the machine ran out of
>> memory, swapped out, and finally crashed after the OOM killer went nuts.
>>
>> Set this value *at your own risk* and only after *significant* testing.
>> Having it too high can have rather unexpected consequences. Setting it
>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
>> very, very bad idea.
>>
> Yep, I know. But in the context of the data warehouse where *I'm the
> only user* and I have a query that does, say 4 large sorts like
> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
> about using 8GB or 16GB in the case of work_mem = 4GB. I realize the
> query above only used 1.9GB for the largest sort but I know I have other
> queries with 1 or 2 sorts that I've watched go to disk.
>
> Bob
>
>
>
>

Wow, you are getting close to the limits there.  Another thing you can
do is mount tmpfs in ram and then just let it spill.

-Andy


Re: Allow sorts to use more available memory

From
Andy Colson
Date:
On 9/12/2011 1:22 PM, Robert Schnabel wrote:
>
> On 9/12/2011 12:57 PM, Shaun Thomas wrote:
>> On 09/12/2011 12:47 PM, Andy Colson wrote:
>>
>>> work_mem is not the total a query can use. I believe each step can
>>> use that much, and each backend can use it for multiple bits. So if
>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>>> 8GB.
>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
>> examine it. Chances are, that one query has one or more sorts. Each one
>> of those gets its own work_mem. Each sort. The query have four sorts? It
>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
>> 16MB on a test system. During a load test, the machine ran out of
>> memory, swapped out, and finally crashed after the OOM killer went nuts.
>>
>> Set this value *at your own risk* and only after *significant* testing.
>> Having it too high can have rather unexpected consequences. Setting it
>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
>> very, very bad idea.
>>
> Yep, I know. But in the context of the data warehouse where *I'm the
> only user* and I have a query that does, say 4 large sorts like
> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
> about using 8GB or 16GB in the case of work_mem = 4GB. I realize the
> query above only used 1.9GB for the largest sort but I know I have other
> queries with 1 or 2 sorts that I've watched go to disk.
>
> Bob
>
>
>
>

Huge guess here, cant see select or ddl, but looks like all the tables
are sequential scans.  It might help to add an index or two, then the
table joins could be done much more efficiently with with a lot less
memory.

-Andy

Re: Allow sorts to use more available memory

From
pasman pasmański
Date:
I think , you may add a ramdisk as tablespace for temporary tables.
This should work similar to bigger work_mem.

2011/9/12, Robert Schnabel <schnabelr@missouri.edu>:
>
> On 9/12/2011 12:57 PM, Shaun Thomas wrote:
>> On 09/12/2011 12:47 PM, Andy Colson wrote:
>>
>>> work_mem is not the total a query can use. I believe each step can
>>> use that much, and each backend can use it for multiple bits. So if
>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>>> 8GB.
>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
>> examine it. Chances are, that one query has one or more sorts. Each one
>> of those gets its own work_mem. Each sort. The query have four sorts? It
>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
>> 16MB on a test system. During a load test, the machine ran out of
>> memory, swapped out, and finally crashed after the OOM killer went nuts.
>>
>> Set this value *at your own risk* and only after *significant* testing.
>> Having it too high can have rather unexpected consequences. Setting it
>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
>> very, very bad idea.
>>
> Yep, I know.  But in the context of the data warehouse where *I'm the
> only user* and I have a query that does, say 4 large sorts like
> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
> about using 8GB or 16GB in the case of work_mem = 4GB.  I realize the
> query above only used 1.9GB for the largest sort but I know I have other
> queries with 1 or 2 sorts that I've watched go to disk.
>
> Bob
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
------------
pasman

Re: Allow sorts to use more available memory

From
Scott Marlowe
Date:
On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel
<schnabelr@missouri.edu> wrote:
> The recent "data warehouse" thread made me think about how I use work_mem
> for some of my big queries.  So I tried SET work_mem = '4GB' for a session
> and got
>
> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 ..
> 2097151)

Ubuntu 10.10, pgsql 8.4.8:

smarlowe=# set work_mem='1000GB';
SET

Re: Allow sorts to use more available memory

From
Robert Schnabel
Date:
On 9/12/2011 3:58 PM, Scott Marlowe wrote:
> On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel
> <schnabelr@missouri.edu>  wrote:
>> The recent "data warehouse" thread made me think about how I use work_mem
>> for some of my big queries.  So I tried SET work_mem = '4GB' for a session
>> and got
>>
>> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 ..
>> 2097151)
> Ubuntu 10.10, pgsql 8.4.8:
>
> smarlowe=# set work_mem='1000GB';
> SET

Ok, so is this a limitation related to the Windows implementation?

And getting back to the to-do list entry and reading the related posts,
it appears that even if you could set work_mem that high it would only
use 2GB anyway.  I guess that was the second part of my question.  Is
that true?


Re: Allow sorts to use more available memory

From
Robert Schnabel
Date:
On 9/12/2011 1:57 PM, Andy Colson wrote:
> On 9/12/2011 1:22 PM, Robert Schnabel wrote:
>> On 9/12/2011 12:57 PM, Shaun Thomas wrote:
>>> On 09/12/2011 12:47 PM, Andy Colson wrote:
>>>
>>>> work_mem is not the total a query can use. I believe each step can
>>>> use that much, and each backend can use it for multiple bits. So if
>>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB =
>>>> 8GB.
>>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to
>>> examine it. Chances are, that one query has one or more sorts. Each one
>>> of those gets its own work_mem. Each sort. The query have four sorts? It
>>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to
>>> 16MB on a test system. During a load test, the machine ran out of
>>> memory, swapped out, and finally crashed after the OOM killer went nuts.
>>>
>>> Set this value *at your own risk* and only after *significant* testing.
>>> Having it too high can have rather unexpected consequences. Setting it
>>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a
>>> very, very bad idea.
>>>
>> Yep, I know. But in the context of the data warehouse where *I'm the
>> only user* and I have a query that does, say 4 large sorts like
>> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried
>> about using 8GB or 16GB in the case of work_mem = 4GB. I realize the
>> query above only used 1.9GB for the largest sort but I know I have other
>> queries with 1 or 2 sorts that I've watched go to disk.
>>
>> Bob
>>
>>
>>
>>
> Huge guess here, cant see select or ddl, but looks like all the tables
> are sequential scans.  It might help to add an index or two, then the
> table joins could be done much more efficiently with with a lot less
> memory.
>
> -Andy
In this case I doubt it.  Basically what these queries are doing is
taking table1 (~30M rows) and finding all the rows with a certain
condition.  This produces ~15M rows.  Then I have to find all of those
15M rows that are present in table2.  In the case of the query above
this results in 1.79M rows.  Basically, the 15M rows that meet the
condition for table1 have matching rows spread out over 10 different
tables (table2's).

Actually, you just gave me an idea.  When I generate the "table1" I can
probably add a field that tells me which "table2" it came from for each
row that satisfies my criteria.  Sometimes just having someone else make
you think is very productive. :-)

Thanks
Bob


Re: Allow sorts to use more available memory

From
Tom Lane
Date:
Robert Schnabel <schnabelr@missouri.edu> writes:
> On 9/12/2011 3:58 PM, Scott Marlowe wrote:
>> On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel
>> <schnabelr@missouri.edu>  wrote:
>>> The recent "data warehouse" thread made me think about how I use work_mem
>>> for some of my big queries.  So I tried SET work_mem = '4GB' for a session
>>> and got
>>> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 ..
>>> 2097151)

>> Ubuntu 10.10, pgsql 8.4.8:
>> smarlowe=# set work_mem='1000GB';
>> SET

> Ok, so is this a limitation related to the Windows implementation?

Yeah.  If you look into guc.c you'll find this:

/* upper limit for GUC variables measured in kilobytes of memory */
/* note that various places assume the byte size fits in a "long" variable */
#if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4
#define MAX_KILOBYTES    INT_MAX
#else
#define MAX_KILOBYTES    (INT_MAX / 1024)
#endif

Since Windows, more or less alone among known 64-bit operating systems,
chose not to make "long" the same width as pointers, these values get
restricted just as if you were on a 32-bit machine.  Few Postgres
developers use Windows enough to get excited about doing all the tedious
(and bug-prone) gruntwork that would be required to fix this.

            regards, tom lane

Re: Allow sorts to use more available memory

From
Stephen Frost
Date:
* Robert Schnabel (schnabelr@missouri.edu) wrote:
> And getting back to the to-do list entry and reading the related
> posts, it appears that even if you could set work_mem that high it
> would only use 2GB anyway.  I guess that was the second part of my
> question.  Is that true?

Yes and no.  work_mem is used by the planner to figure out what kind of
plan to use.  The planner plans things based off of statistics, but it's
not perfect, especially on large tables with lots of data which have
dependent data between columns.

Where the 2GB limit comes into play is when you end up with a plan that
does, say, a large sort.  PG will use memory for the sort up to
work_mem, or 2GB, whichever is lower, and spill to disk after that.  I
don't believe it has such a limit for a hash table, due to how the data
structures for the hash table are allocated (and I recall seeing single
PG queries that use hash tables getting into the 30+GB range, of course,
I had work_mem set upwards of 100GB on a 32GB box... :).

So, if you're doing data warehousing, and you're pretty much the only
user (or there's only one at a time), setting it up pretty high is
acceptable, but you do need to watch the box and make sure you don't run
it out of memory.  Also, make sure you have things configured correctly,
if you're using Linux, to prevent the OOM killer from kicking in.  Also,
as I suggested before, set it to a reasonable level for the 'default'
and just up it for specific queries that may benefit from it.

        Thanks,

            Stephen

Attachment

Re: Allow sorts to use more available memory

From
Stephen Frost
Date:
* Robert Schnabel (schnabelr@missouri.edu) wrote:
> And getting back to the to-do list entry and reading the related
> posts, it appears that even if you could set work_mem that high it
> would only use 2GB anyway.  I guess that was the second part of my
> question.  Is that true?

Errr, and to get back to the to-do (which I've been considering doing
something about...), it's to allow the *actual* memory usage for things
like sorts to use more than 2GB, but as others have pointed out, you can
do that by putting pgsql_tmp on a memory filesystem and letting the
sorts spill to the memory-based FS.

    Thanks,

        Stephen

Attachment

Re: Allow sorts to use more available memory

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Robert Schnabel (schnabelr@missouri.edu) wrote:
> > And getting back to the to-do list entry and reading the related
> > posts, it appears that even if you could set work_mem that high it
> > would only use 2GB anyway.  I guess that was the second part of my
> > question.  Is that true?
>
> Errr, and to get back to the to-do (which I've been considering doing
> something about...), it's to allow the *actual* memory usage for things
> like sorts to use more than 2GB, but as others have pointed out, you can
> do that by putting pgsql_tmp on a memory filesystem and letting the
> sorts spill to the memory-based FS.

It would be nice if the tempfs would allow us to control total temp
memory usage, except it causes a failure rather than splilling to real
disk.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +