using a lot of maintenance_work_mem - Mailing list pgsql-hackers

From Frederik Ramm
Subject using a lot of maintenance_work_mem
Date
Msg-id 4D593814.7030908@remote.org
Whole thread Raw
Responses Re: using a lot of maintenance_work_mem  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: using a lot of maintenance_work_mem  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
Hi,
    I am (ab)using a PostgreSQL database (with PostGIS extension) in a 
large data processing job - each day, I load several GB of data, run a 
lot of analyses on it, and then throw everything away again. Loading, 
running, and dumping the results takes about 18 hours every day.

The job involves a lot of index building and sorting, and is run on a 
64-bit machine with 96 GB of RAM.

Naturally I would like the system to use as much RAM as possible before 
resorting to disk-based operations, but no amount of 
maintenance_work_mem setting seems to make it do my bidding.

I'm using PostgreSQL 8.3 but would be willing and able to upgrade to any 
later version.

Some googling has unearthed the issue - which is likely known to all of 
you, just repeating it to prove I've done my homework - that tuplesort.c 
always tries to double its memory allocation, and will refuse to do so 
if that results in an allocation greater than MaxAllocSize:
 if ((Size) (state->memtupsize * 2) >= MaxAllocSize / sizeof(SortTuple))     return false;

And MaxAllocSize is hardcoded to 1 GB in memutils.h.

(All this based on Postgres 9.1alpha source - I didn't want to bring 
something up that has been fixed already.)

Now I assume that there are reasons that you're doing this. memutils.h 
has the (for me) cryptic comment about MaxAllocSize: "XXX This is 
deliberately chosen to correspond to the limiting size of varlena 
objects under TOAST. See VARATT_MASK_SIZE in postgres.h.", but 
VARATT_MASK_SIZE has zero other occurences in the source code.

If I were to either (a) increase MaxAllocSize to, say, 48 GB instead of 
1 GB, or (b) hack tuplesort.c to ignore MaxAllocSize, just for my local 
setup - would that likely be viable in my situation, or would I break 
countless things?

I can afford some experimentation; as I said, I'm throwing away the 
database every day anyway. I just thought I'd solicit your advice before 
I do anything super stupid. - If I can use my setup to somehow 
contribute to further PostgreSQL development by trying out some things, 
I'll be more than happy to do so. I do C/C++ but apart from building 
packages for several platforms, I haven't worked with the PostgreSQL 
source code.

Of course the cop-out solution would be to just create a huge RAM disk 
and instruct PostgreSQL to use that for disk-based sorting. I'll do that 
if all of you say "OMG don't touch MaxAllocSize" ;)

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik@remote.org  ##  N49°00'09" E008°23'33"


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Add support for logging the current role
Next
From: Tom Lane
Date:
Subject: Re: using a lot of maintenance_work_mem