Tom,
> My own thoughts about the problems with our work_mem arrangement are
> that the real problem is the rule that we can allocate work_mem per sort
> or hash operation; this makes the actual total memory use per backend
> pretty unpredictable for nontrivial queries. I don't know how to fix
> this though. The planner needs to know the work_mem that will be used
> for any one of these operations in order to estimate costs, so simply
> trying to divide up work_mem among the operations of a completed plan
> tree is not going to improve matters.
Yes ... the unpredictability is the problem:
(1) We can only allocate the # of connections and default work_mem per
operation.
(2) There are a variable # of concurrent queries per connection (0..1)
(3) Each query has a variable # of operations requiring work_mem, which
will require a variable amount of work_mem. If your malloc is good, this
is limited to concurrent operations, but for some OSes this is all
operations per query. Thus the former uses 0..3xwork_mem per query and
the latter 0..7x in general practice.
Overall, this means that based on a specific max_connections and work_mem,
a variable amount between 1*work_mem and (connections*3*work_mem) memory
may be needed at once. Since the penalty for overallocating RAM is severe
on most OSes, DBAs are forced to allow for the worst case. This results
in around 2/3 underallocation on systems with unpredictable loads. This
means that, even on heavily loaded DB systems, most of the time you're
wasting a big chunk of your RAM.
Simon and I met about this (and other stuff) at the GreenPlum offices last
summer. The first plan we came up with is query queueing. Query
queueing would eliminate variability (2), which would then make the only
variability one of how much work_mem would be needed per query, reducing
(but not eliminating) the underallocation. Additionally, we thought to
tie the query queues to ROLES, which would allow the administrator to
better control how much work_mem per type of query was allowed. It would
also allow admins to balance priorities better on mixed-load machines.
Mind you, I'm also thinking that on enterprise installations with
multi-department use of the database, the fact that work_mem is
inalienably USERSET is also an allocation problem. One user with a SET
command can blow all of your resource planning away.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco