Thread: How to cope with low disk space

How to cope with low disk space

From
Michiel Holtkamp
Date:
Hello list,

Just joined, read the archives, but couldn't find a solution to my
problem. My question is: 'How can I best determine when to delete
data?'. Hopefully this is the right place to ask, otherwise kindly
redirect me to the proper list.

The situation is as follows: we use PostgreSQL 8.1 to store large
amounts of data (we are talking GB's). This data is stored as large
objects and when we delete data, we don't forget to delete the
corresponding large objects as well. The data stored is deleted after a
while (usually a couple of weeks), so far so good.

Due to the nature of the information (sound data, recording triggered on
  certain technical details) the amount of information is not very
predictable. Sometimes a lot of data is stored over a period of a few
days and the disk runs out of free space (this is not theoretical, in
one case it happened already). For this situation we decided that we
don't mind deleting some data earlier than normal, to ensure that we can
store newly generated data (newer data is more important than older data).

The problem:
Somehow we have to decide when to delete data earlier than normal. We
can't do this by checking disk-space, because postgres reserves
disk-space. Freeing disk-space can be done by doing a full vacuum, but
this locks tables and could cause data to be lost, besides I don't mind
that postgres reserves tables, it's more efficient anyway.

If anyone has ideas about this problem, it would be greatly appreciated,
I'm sure this is a problem encountered by more persons. I've already
looked at certain system tables (specifically pg_stat_user_tables) and
at docs like: http://www.postgresql.org/docs/8.1/static/diskusage.html
but so for no satisfying solution emerged.

Thanks,
Michiel Holtkamp


Re: How to cope with low disk space

From
"vincent"
Date:
> Hello list,
>
> predictable. Sometimes a lot of data is stored over a period of a few
> days and the disk runs out of free space (this is not theoretical, in
> one case it happened already). For this situation we decided that we

I suppose expanding the amount of storage space is not an option? Hardisks
are cheap these days.

Also, what kind of workload does your server have, surely you must be able
to vacuum at some point in time?


Re: How to cope with low disk space

From
"Peter Childs"
Date:


On 14/02/2008, Michiel Holtkamp <michiel.holtkamp@soundintel.com> wrote:
Hello list,

Just joined, read the archives, but couldn't find a solution to my
problem. My question is: 'How can I best determine when to delete
data?'. Hopefully this is the right place to ask, otherwise kindly
redirect me to the proper list.

The situation is as follows: we use PostgreSQL 8.1 to store large
amounts of data (we are talking GB's). This data is stored as large
objects and when we delete data, we don't forget to delete the
corresponding large objects as well. The data stored is deleted after a
while (usually a couple of weeks), so far so good.

Due to the nature of the information (sound data, recording triggered on
  certain technical details) the amount of information is not very
predictable. Sometimes a lot of data is stored over a period of a few
days and the disk runs out of free space (this is not theoretical, in
one case it happened already). For this situation we decided that we
don't mind deleting some data earlier than normal, to ensure that we can
store newly generated data (newer data is more important than older data).

The problem:
Somehow we have to decide when to delete data earlier than normal. We
can't do this by checking disk-space, because postgres reserves
disk-space. Freeing disk-space can be done by doing a full vacuum, but
this locks tables and could cause data to be lost, besides I don't mind
that postgres reserves tables, it's more efficient anyway.

If anyone has ideas about this problem, it would be greatly appreciated,
I'm sure this is a problem encountered by more persons. I've already
looked at certain system tables (specifically pg_stat_user_tables) and
at docs like: http://www.postgresql.org/docs/8.1/static/diskusage.html
but so for no satisfying solution emerged.

Thanks,
Michiel Holtkamp


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


I think you need to know depending on a mix of free disk space and free space map usage. If you do a standard Vacuum Verbose it will tell you how full the fsm is. You need to ensure that you have enough free disk space and or a (relativly) full fsm. When the fsm is empty the database has to use disk space, 

I probably not making any sence

Re: How to cope with low disk space

From
Bill Moran
Date:
In response to Michiel Holtkamp <michiel.holtkamp@soundintel.com>:

> Hello list,
>
> Just joined, read the archives, but couldn't find a solution to my
> problem. My question is: 'How can I best determine when to delete
> data?'. Hopefully this is the right place to ask, otherwise kindly
> redirect me to the proper list.
>
> The situation is as follows: we use PostgreSQL 8.1 to store large
> amounts of data (we are talking GB's). This data is stored as large
> objects and when we delete data, we don't forget to delete the
> corresponding large objects as well. The data stored is deleted after a
> while (usually a couple of weeks), so far so good.
>
> Due to the nature of the information (sound data, recording triggered on
>   certain technical details) the amount of information is not very
> predictable. Sometimes a lot of data is stored over a period of a few
> days and the disk runs out of free space (this is not theoretical, in
> one case it happened already). For this situation we decided that we
> don't mind deleting some data earlier than normal, to ensure that we can
> store newly generated data (newer data is more important than older data).
>
> The problem:
> Somehow we have to decide when to delete data earlier than normal. We
> can't do this by checking disk-space, because postgres reserves
> disk-space. Freeing disk-space can be done by doing a full vacuum, but
> this locks tables and could cause data to be lost, besides I don't mind
> that postgres reserves tables, it's more efficient anyway.
>
> If anyone has ideas about this problem, it would be greatly appreciated,
> I'm sure this is a problem encountered by more persons. I've already
> looked at certain system tables (specifically pg_stat_user_tables) and
> at docs like: http://www.postgresql.org/docs/8.1/static/diskusage.html
> but so for no satisfying solution emerged.

Since you don't give any idea how much data is involved, let me iterate
through your choices, given the unknowns:

1) Buy more disk space.  It's cheap.  Get an external SCSI unit and a
   SCSI PCI card.
2) Work on your vacuum schedule.  You don't need to vacuum full all the
   time, and regular vacuum doesn't lock tables.  If you do regular
   vacuum often enough, you won't see significant bloat.
3) Reduce the data size.  You say this is audio data, can you reduce
   the bitrate or other storage factors so the data streams aren't so
   huge?

In general, if you're hitting the limits of your physical storage on
a regular basis, you either underestimated your storage requirements
(which means you should get more storage) or you're storing too much.

The _business_need_ needs to dictate how often you purge old data.
If you don't have enough hardware to meet the business need, you need
to add hardware.

If the business need is to store X gigabytes with no regard for how
old the data is, then you need to adjust your data storage methods
to work with that.  Create a table to store the size of each LO, and
run a regular maintenance job that purges old data when the used
size gets too big.

--
Bill Moran
http://www.potentialtech.com

Re: How to cope with low disk space

From
Michiel Holtkamp
Date:
First off, thank you all for such a quick response. I will reply on
several emails at the same time, because the answers overlap.

Bill Moran wrote:
> In response to Michiel Holtkamp <michiel.holtkamp@soundintel.com>:
>
> Since you don't give any idea how much data is involved, let me iterate
> through your choices, given the unknowns:
>
> 1) Buy more disk space.  It's cheap.  Get an external SCSI unit and a
>    SCSI PCI card.

Also suggested by Vincent. Normally this would be an option, but it
doesn't really solve the problem. I didn't explain this properly enough,
but the low disk space problem is an exception to the rule. Normally we
have (more than) enough disk space to store the information, but
sometimes (due to hardware problems or on purpose) the database is
filled with much more data.

To give you an idea of the figures we are talking about: Say we have a
250 GB disk. Normally we would use about 4-8 GB of database. Sometimes
we want to have more information for research, so we lower the threshold
that trigger recording. Because we don't know in advance how much data
this will give us, the disk can run low on disk space. If it does run
out of disk space, that doesn't really matter, because we will not use
that data anyway (we will use only a small portion), as long as it
doesn't completely run out of disk space.

Running low on disk space can also happen when something goes wrong with
the recording trigger (this can be a hardware fault).

In both cases, we don't want to store even more data, but we will want
to make a selection based on age (newer data is more important).

Hopefully, I've explained this better now :)

> 2) Work on your vacuum schedule.  You don't need to vacuum full all the
>    time, and regular vacuum doesn't lock tables.  If you do regular
>    vacuum often enough, you won't see significant bloat.

Ok, this is good. Regular vacuum doesn't lock tables, as you've said and
this is what we want. However, vacuuming sometimes isn't enough. I've
explained in the previous point that my problem occurs as an exception.
Vacuuming reclaims space to be used again, but I want to detect when to
delete _extra_ data (that vacuum then can reclaim) so that we don't run
out of disk space.

> 3) Reduce the data size.  You say this is audio data, can you reduce
>    the bitrate or other storage factors so the data streams aren't so
>    huge?

This is not an option. I won't bore you with the details :)

> In general, if you're hitting the limits of your physical storage on
> a regular basis, you either underestimated your storage requirements
> (which means you should get more storage) or you're storing too much.

agreed.

> The _business_need_ needs to dictate how often you purge old data.
> If you don't have enough hardware to meet the business need, you need
> to add hardware.
>
> If the business need is to store X gigabytes with no regard for how
> old the data is, then you need to adjust your data storage methods
> to work with that.  Create a table to store the size of each LO, and
> run a regular maintenance job that purges old data when the used
> size gets too big.

This we could do, but this looks like we are storing internally how much
space we actually use (instead of the total space reserved). I was
hoping there was a way to ask how much space is reserved, but not claimed.

Peter Childs [reformatted] wrote:
 > I think you need to know depending on a mix of free disk space and
 > free space map usage. If you do a standard Vacuum Verbose it will tell
 > you how full the fsm is. You need to ensure that you have enough free
 > disk space and or a (relativly) full fsm. When the fsm is empty the
 > database has to use disk space,

Maybe this is what I'm looking for. Is there any other way to retrieve
information on the fsm usage?

Thanks again for all your inputs.
Michiel



Re: How to cope with low disk space

From
Erik Jones
Date:
On Feb 14, 2008, at 10:34 AM, Michiel Holtkamp wrote:

>> Peter Childs [reformatted] wrote:
> > I think you need to know depending on a mix of free disk space and
> > free space map usage. If you do a standard Vacuum Verbose it will
> tell
> > you how full the fsm is. You need to ensure that you have enough
> free
> > disk space and or a (relativly) full fsm. When the fsm is empty the
> > database has to use disk space,
>
> Maybe this is what I'm looking for. Is there any other way to
> retrieve information on the fsm usage?

Have a look at the pg_freespacemap contrib package.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: How to cope with low disk space

From
Michiel Holtkamp
Date:
Erik Jones wrote:
>
> On Feb 14, 2008, at 10:34 AM, Michiel Holtkamp wrote:
>> Maybe this is what I'm looking for. Is there any other way to retrieve
>> information on the fsm usage?
>
> Have a look at the pg_freespacemap contrib package.

Thank you! I will.

Regards,
Michiel


Re: How to cope with low disk space

From
Shane Ambler
Date:
Is postgresql the only thing using the disk space/partition?

Have you considered running a cron job to parse df output to "trigger" a
delete when disk usage gets to a set threshold? and thus also account
for any unexpected non-postgresql disk usage.

I would also think you would want to consider the size of the old stored
data when deciding how many records to delete.


> To give you an idea of the figures we are talking about: Say we have
> a 250 GB disk. Normally we would use about 4-8 GB of database.


Given that you normally have 4-8GB of data and you have trouble when a
fault/error causes an excess of 200GB I would also think about
triggering a stop recording under those conditions. If it takes 200GB of
data to automate a data purging then the purging of *all* old records is
going to give you a short time of extra space unless you start purging
the beginning of the current erroneous recording.


I am thinking that a cron job that will email/page/sms you when it hits
50% disk usage would be a better solution that would simply give you a
heads up to find and fix the fault causing the excess usage.





--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: How to cope with low disk space

From
Michiel Holtkamp
Date:
On Feb 14, 2008, at 19:24 , Shane Ambler wrote:
> Is postgresql the only thing using the disk space/partition?

There are some other things on the partition, but they don't change
in size.

> Have you considered running a cron job to parse df output to
> "trigger" a
> delete when disk usage gets to a set threshold? and thus also account
> for any unexpected non-postgresql disk usage.

Yes, but this is not ideal:

1) sometimes we want a lot of data, just not that the disk runs out
of free space
2) if you trigger on a disk space threshold only, you'll have to do a
full vacuum and that gives problem in our application (because of
locking and possibly long vacuum-times).

> I would also think you would want to consider the size of the old
> stored
> data when deciding how many records to delete.

Good thinking. Luckily, our data is almost always of the same size so
it has little influence in this case.

Thanks for your input!
Michiel


Re: How to cope with low disk space

From
Bill Moran
Date:
In response to Michiel Holtkamp <michiel.holtkamp@soundintel.com>:

>
> On Feb 14, 2008, at 19:24 , Shane Ambler wrote:
> > Is postgresql the only thing using the disk space/partition?
>
> There are some other things on the partition, but they don't change
> in size.
>
> > Have you considered running a cron job to parse df output to
> > "trigger" a
> > delete when disk usage gets to a set threshold? and thus also account
> > for any unexpected non-postgresql disk usage.
>
> Yes, but this is not ideal:
>
> 1) sometimes we want a lot of data, just not that the disk runs out
> of free space
> 2) if you trigger on a disk space threshold only, you'll have to do a
> full vacuum and that gives problem in our application

Not necessarily.  let's say you trigger the cleanup job to run at 50%
used space, which is a pretty good place to start considering normal
operation is less than 5%.  At that point, if your run a normal vacuum,
40% or more of the space in the tables will be available for reuse.
Normal vacuum _will_ return _some_ disk space to the OS, but is not
nearly as aggressive as vacuum full is.  At that point, even if the
size of the tables on disk is 40% of the usable drive size, the amount
of space available in the tables will be 90% ore more, and the addition
of more data will reuse that unused space.

This _requires_ frequent vacuum, and accurate estimation of how quickly
you need to trigger the process.  It's probably going to take some
experimentation and babysitting on your part to get it right.

Were it me, I'd just add some hard drives to get the system up to about
1T of disk space.  If you can't get that budget, you'll have to be a
little more clever about it.

--
Bill Moran
http://www.potentialtech.com

Re: How to cope with low disk space

From
"Daniel Verite"
Date:
    Michiel Holtkamp wrote:

> Running low on disk space can also happen when something goes wrong
with
> the recording trigger (this can be a hardware fault). In both cases,
we
> don't want to store even more data, but we will want to make a
selection
> based on age (newer data is more important).

Did you think about testing pg_relation_size('pg_largeobject') against
a fixed threshold before allocating new large objects?
If it's higher than the threshold, you could either stop importing new
data, or lo_unlink-ing older entries and immediately issuing a plain
vacuum before continuing.

--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org