Thread: Running out of disk space during query
Greetings, * Simon Riggs (simon@2ndquadrant.com) wrote: > work_mem= 1 GB benefit at 8 TB > work_mem= 256MB benefit at 0.5 TB > (based upon runs on average twice size of memory, and each logical tape > requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which > for work_mem > 2 MB gives 0.5 * work_mem^2) Seeing this reminded me of an issue I ran into recently. In 8.1 on a database that's only 16G, I ran a query that chewed up all the available disk space (about 250G, yes, 0.25TB) on the partition and then failed. Of course, this took many hours on a rather speedy box (and the disk array is a pretty nice IBM SAN so it's not exactly a slacker either) and produced nothing for me. I'd like to think it's often the case that Postgres has some idea what the total disk space usage of a given query is going to be prior to actually running the whole query and just seeing how much space it took at the highest point. If this can be done with some confidence then it'd be neat if Postgres could either check if there's enough disk space available and if not bail (I know, difficult to do cross-platform and there's tablespaces and whatnot to consider) OR if there was a parameter along the lines of "max_temp_disk_space" which would fail the query if that would be exceeded by the query. The latter could even be two GUC variables, one administrator set and unchangable by the user ('hard' limit) and one settable by the user with a sane default ('soft' limit) and perhaps a HINT which indicates how to change it in the error message when the limit is hit. I suppose I could put quotas in place or something but I don't really have a problem with the database as a whole using up a bunch of disk space (hence why it's got alot of room to grow into), I just would have liked a "this will chew up more disk space than you have and then fail" message instead of what ended up happening for this query. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > I suppose I could put quotas in place or something but I don't really > have a problem with the database as a whole using up a bunch of disk > space (hence why it's got alot of room to grow into), I just would have > liked a "this will chew up more disk space than you have and then fail" > message instead of what ended up happening for this query. I've got the same problem with this that I do with the recently-proposed patch to fail queries with estimated cost > X --- to wit, I think it will result in a net *reduction* in system reliability not an improvement. Any such feature changes the planner estimates from mere heuristics into a gating factor that will make queries fail entirely. And they are really not good enough to put that kind of trust into. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > I've got the same problem with this that I do with the recently-proposed > patch to fail queries with estimated cost > X --- to wit, I think it > will result in a net *reduction* in system reliability not an improvement. > Any such feature changes the planner estimates from mere heuristics into > a gating factor that will make queries fail entirely. And they are > really not good enough to put that kind of trust into. Perhaps instead then have the system fail the query once it's gone beyond some configurable limit on temporary disk usage? The query still would have run for a while but it wouldn't have run the partition out of space and would have come back faster at least. Comparing this to work_mem- do we do something like this there? I don't think we do, which means we're trusting the planner's estimate to get the memory size estimate right and that can end up being way off resulting in queries taking up well beyond what work_mem would normally allow them... I recall alot of discussion but don't recall if anything was actually done to resolve that issue either. It seems to me we should probably: not trust the planner's estimates and therefore implement checks to fail things once we've gone well beyond what we expected to use. If we've done this for work_mem then using whatever we did there for a 'temporary disk space limit' would at least make me happy. If we havn't then perhaps we should do something for both. Thanks! Stephen
On Wed, 2006-03-08 at 08:33 -0500, Stephen Frost wrote: > Greetings, > > * Simon Riggs (simon@2ndquadrant.com) wrote: > > work_mem= 1 GB benefit at 8 TB > > work_mem= 256MB benefit at 0.5 TB > > (based upon runs on average twice size of memory, and each logical tape > > requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which > > for work_mem > 2 MB gives 0.5 * work_mem^2) > > Seeing this reminded me of an issue I ran into recently. In 8.1 on a > database that's only 16G, I ran a query that chewed up all the available > disk space (about 250G, yes, 0.25TB) on the partition and then failed. > Of course, this took many hours on a rather speedy box (and the disk > array is a pretty nice IBM SAN so it's not exactly a slacker either) and > produced nothing for me. > > I'd like to think it's often the case that Postgres has some idea what > the total disk space usage of a given query is going to be prior to > actually running the whole query and just seeing how much space it took > at the highest point. If this can be done with some confidence then > it'd be neat if Postgres could either check if there's enough disk space > available and if not bail (I know, difficult to do cross-platform and > there's tablespaces and whatnot to consider) OR if there was a parameter > along the lines of "max_temp_disk_space" which would fail the query if > that would be exceeded by the query. The latter could even be two GUC > variables, one administrator set and unchangable by the user ('hard' > limit) and one settable by the user with a sane default ('soft' limit) > and perhaps a HINT which indicates how to change it in the error > message when the limit is hit. > > I suppose I could put quotas in place or something but I don't really > have a problem with the database as a whole using up a bunch of disk > space (hence why it's got alot of room to grow into), I just would have > liked a "this will chew up more disk space than you have and then fail" > message instead of what ended up happening for this query. We can do "work_space" and "maintenance_work_space" fairly easily. We know how much we are writing, so we don't need to ask the OS how much it has left, just compare against the parameter and assume that it has been set correctly by the admin. Personally, I would rather abort a large sort before we ran for many hours and then hit those limits. That was the purpose of the statement_cost_limit parameter mentioned just recently. Top-down space allocation is essentially the same problem as top-down memory allocation. In both memory and tempspace we have a hard limit that if we go beyond, bad things happen. ISTM that we would like to logically allocate these resources from central pool(s) and then reclaim or return that allocation when you're done with it. In both cases the actual physical allocation would be made by the individual backend. It's fairly easy to track overall space, but its somewhat harder to force a single query to work within a single allocation since multiple steps might well want to allocate the same work_mem and have been optimized to expect they will get that size of allocation... Best Regards, Simon Riggs