Thread: Running out of disk space during query

Running out of disk space during query

From
Stephen Frost
Date:
Greetings,

* Simon Riggs (simon@2ndquadrant.com) wrote:
> work_mem= 1 GB        benefit at 8 TB
> work_mem= 256MB         benefit at 0.5 TB
> (based upon runs on average twice size of memory, and each logical tape
> requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which
> for work_mem > 2 MB gives 0.5 * work_mem^2)

Seeing this reminded me of an issue I ran into recently.  In 8.1 on a
database that's only 16G, I ran a query that chewed up all the available
disk space (about 250G, yes, 0.25TB) on the partition and then failed.
Of course, this took many hours on a rather speedy box (and the disk
array is a pretty nice IBM SAN so it's not exactly a slacker either) and
produced nothing for me.

I'd like to think it's often the case that Postgres has some idea what
the total disk space usage of a given query is going to be prior to
actually running the whole query and just seeing how much space it took
at the highest point.  If this can be done with some confidence then
it'd be neat if Postgres could either check if there's enough disk space
available and if not bail (I know, difficult to do cross-platform and
there's tablespaces and whatnot to consider) OR if there was a parameter
along the lines of "max_temp_disk_space" which would fail the query if
that would be exceeded by the query.  The latter could even be two GUC
variables, one administrator set and unchangable by the user ('hard'
limit) and one settable by the user with a sane default ('soft' limit)
and perhaps a HINT which indicates how to change it in the error
message when the limit is hit.

I suppose I could put quotas in place or something but I don't really
have a problem with the database as a whole using up a bunch of disk
space (hence why it's got alot of room to grow into), I just would have
liked a "this will chew up more disk space than you have and then fail"
message instead of what ended up happening for this query.
Thanks!
    Stephen

Re: Running out of disk space during query

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I suppose I could put quotas in place or something but I don't really
> have a problem with the database as a whole using up a bunch of disk
> space (hence why it's got alot of room to grow into), I just would have
> liked a "this will chew up more disk space than you have and then fail"
> message instead of what ended up happening for this query.

I've got the same problem with this that I do with the recently-proposed
patch to fail queries with estimated cost > X --- to wit, I think it
will result in a net *reduction* in system reliability not an improvement.
Any such feature changes the planner estimates from mere heuristics into
a gating factor that will make queries fail entirely.  And they are
really not good enough to put that kind of trust into.
        regards, tom lane


Re: Running out of disk space during query

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I've got the same problem with this that I do with the recently-proposed
> patch to fail queries with estimated cost > X --- to wit, I think it
> will result in a net *reduction* in system reliability not an improvement.
> Any such feature changes the planner estimates from mere heuristics into
> a gating factor that will make queries fail entirely.  And they are
> really not good enough to put that kind of trust into.

Perhaps instead then have the system fail the query once it's gone
beyond some configurable limit on temporary disk usage?  The query still
would have run for a while but it wouldn't have run the partition out of
space and would have come back faster at least.

Comparing this to work_mem- do we do something like this there?  I don't
think we do, which means we're trusting the planner's estimate to get
the memory size estimate right and that can end up being way off
resulting in queries taking up well beyond what work_mem would normally
allow them...  I recall alot of discussion but don't recall if anything
was actually done to resolve that issue either.

It seems to me we should probably: not trust the planner's estimates and
therefore implement checks to fail things once we've gone well beyond
what we expected to use.  If we've done this for work_mem then using
whatever we did there for a 'temporary disk space limit' would at least
make me happy.  If we havn't then perhaps we should do something for
both.
Thanks!
    Stephen

Re: Running out of disk space during query

From
Simon Riggs
Date:
On Wed, 2006-03-08 at 08:33 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Simon Riggs (simon@2ndquadrant.com) wrote:
> > work_mem= 1 GB        benefit at 8 TB
> > work_mem= 256MB         benefit at 0.5 TB
> > (based upon runs on average twice size of memory, and each logical tape
> > requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which
> > for work_mem > 2 MB gives 0.5 * work_mem^2)
> 
> Seeing this reminded me of an issue I ran into recently.  In 8.1 on a
> database that's only 16G, I ran a query that chewed up all the available
> disk space (about 250G, yes, 0.25TB) on the partition and then failed.
> Of course, this took many hours on a rather speedy box (and the disk
> array is a pretty nice IBM SAN so it's not exactly a slacker either) and
> produced nothing for me.
> 
> I'd like to think it's often the case that Postgres has some idea what
> the total disk space usage of a given query is going to be prior to
> actually running the whole query and just seeing how much space it took
> at the highest point.  If this can be done with some confidence then
> it'd be neat if Postgres could either check if there's enough disk space
> available and if not bail (I know, difficult to do cross-platform and
> there's tablespaces and whatnot to consider) OR if there was a parameter
> along the lines of "max_temp_disk_space" which would fail the query if
> that would be exceeded by the query.  The latter could even be two GUC
> variables, one administrator set and unchangable by the user ('hard'
> limit) and one settable by the user with a sane default ('soft' limit)
> and perhaps a HINT which indicates how to change it in the error
> message when the limit is hit.
> 
> I suppose I could put quotas in place or something but I don't really
> have a problem with the database as a whole using up a bunch of disk
> space (hence why it's got alot of room to grow into), I just would have
> liked a "this will chew up more disk space than you have and then fail"
> message instead of what ended up happening for this query.

We can do "work_space" and "maintenance_work_space" fairly easily. We
know how much we are writing, so we don't need to ask the OS how much it
has left, just compare against the parameter and assume that it has been
set correctly by the admin.

Personally, I would rather abort a large sort before we ran for many
hours and then hit those limits. That was the purpose of the
statement_cost_limit parameter mentioned just recently.

Top-down space allocation is essentially the same problem as top-down
memory allocation. In both memory and tempspace we have a hard limit
that if we go beyond, bad things happen. ISTM that we would like to
logically allocate these resources from central pool(s) and then reclaim
or return that allocation when you're done with it. In both cases the
actual physical allocation would be made by the individual backend. It's
fairly easy to track overall space, but its somewhat harder to force a
single query to work within a single allocation since multiple steps
might well want to allocate the same work_mem and have been optimized to
expect they will get that size of allocation...

Best Regards, Simon Riggs