Thread: Starting Postgres when there is no disk space

Starting Postgres when there is no disk space

From
Igal Sapir
Date:
I have Postgres running in a Docker container with PGDATA mounted from the host.  Postgres consume all of the disk space, 130GB [1], and can not be started [2].  The database has a lot of bloat due to much many deletions.  The problem is that now I can not start Postgres at all.

I mounted an additional partition with 100GB, hoping to fix the bloat with a TABLESPACE in the new mount, but how can I do anything if Postgres will not start in the first place?

I expected there to be a tool that can defrag the database files, e.g. a "vacuumdb" utility that can run without Postgres.  Or maybe run Postgres and disable the WAL so that no new disk space will be required.

Surely, I'm not the first one to experience this issue.  How can I fix this?

Thank you,

Igal

[1]
root@ff818ff7550a:/# du -h --max-depth=1 /pgdata
625M    /pgdata/pg_wal
608K    /pgdata/global
0       /pgdata/pg_commit_ts
0       /pgdata/pg_dynshmem
8.0K    /pgdata/pg_notify
0       /pgdata/pg_serial
0       /pgdata/pg_snapshots
16K     /pgdata/pg_subtrans
0       /pgdata/pg_twophase
16K     /pgdata/pg_multixact
130G    /pgdata/base
0       /pgdata/pg_replslot
0       /pgdata/pg_tblspc
0       /pgdata/pg_stat
0       /pgdata/pg_stat_tmp
7.9M    /pgdata/pg_xact
4.0K    /pgdata/pg_logical
0       /pgdata/tmp
130G    /pgdata

[2]
postgres@1efd26b999ca:/$ /usr/lib/postgresql/11/bin/pg_ctl start
waiting for server to start....2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv6 address "::", port 5432
2019-05-01 20:43:59.303 UTC [34] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-01 20:43:59.322 UTC [35] LOG:  database system shutdown was interrupted; last known up at 2019-05-01 19:37:32 UTC
2019-05-01 20:43:59.863 UTC [35] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-01 20:43:59.865 UTC [35] LOG:  redo starts at 144/4EFFFC18
...2019-05-01 20:44:02.389 UTC [35] LOG:  redo done at 144/74FFE060
2019-05-01 20:44:02.389 UTC [35] LOG:  last completed transaction was at log time 2019-04-28 05:05:24.687581+00
.2019-05-01 20:44:03.474 UTC [35] PANIC:  could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device
2019-05-01 20:44:03.480 UTC [34] LOG:  startup process (PID 35) was terminated by signal 6: Aborted
2019-05-01 20:44:03.480 UTC [34] LOG:  aborting startup due to startup process failure
2019-05-01 20:44:03.493 UTC [34] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.












Re: Starting Postgres when there is no disk space

From
David Rowley
Date:
On Thu, 2 May 2019 at 12:07, Igal Sapir <igal@lucee.org> wrote:
> I mounted an additional partition with 100GB, hoping to fix the bloat with a TABLESPACE in the new mount, but how can
Ido anything if Postgres will not start in the first place?
 

You could move the pg_wal directory over to the new partition and ln
it back to its original location.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Starting Postgres when there is no disk space

From
Michael Loftis
Date:
Best option....Copy/move the entire  pgdata to a larger space. It may also be enough to just move the WAL (leaving a symlink) freeing up the 623M but I doubt it since VACUUM FULL occurs in the same table space and can need an equal amount of space (130G) depending on how much it can actually free up.

You may also get away with just moving (and leaving a symlink) for the base but I don't recall if that works or not.

On Wed, May 1, 2019 at 18:07 Igal Sapir <igal@lucee.org> wrote:
I have Postgres running in a Docker container with PGDATA mounted from the host.  Postgres consume all of the disk space, 130GB [1], and can not be started [2].  The database has a lot of bloat due to much many deletions.  The problem is that now I can not start Postgres at all.

I mounted an additional partition with 100GB, hoping to fix the bloat with a TABLESPACE in the new mount, but how can I do anything if Postgres will not start in the first place?

I expected there to be a tool that can defrag the database files, e.g. a "vacuumdb" utility that can run without Postgres.  Or maybe run Postgres and disable the WAL so that no new disk space will be required.

Surely, I'm not the first one to experience this issue.  How can I fix this?

Thank you,

Igal

[1]
root@ff818ff7550a:/# du -h --max-depth=1 /pgdata
625M    /pgdata/pg_wal
608K    /pgdata/global
0       /pgdata/pg_commit_ts
0       /pgdata/pg_dynshmem
8.0K    /pgdata/pg_notify
0       /pgdata/pg_serial
0       /pgdata/pg_snapshots
16K     /pgdata/pg_subtrans
0       /pgdata/pg_twophase
16K     /pgdata/pg_multixact
130G    /pgdata/base
0       /pgdata/pg_replslot
0       /pgdata/pg_tblspc
0       /pgdata/pg_stat
0       /pgdata/pg_stat_tmp
7.9M    /pgdata/pg_xact
4.0K    /pgdata/pg_logical
0       /pgdata/tmp
130G    /pgdata

[2]
postgres@1efd26b999ca:/$ /usr/lib/postgresql/11/bin/pg_ctl start
waiting for server to start....2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv6 address "::", port 5432
2019-05-01 20:43:59.303 UTC [34] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-01 20:43:59.322 UTC [35] LOG:  database system shutdown was interrupted; last known up at 2019-05-01 19:37:32 UTC
2019-05-01 20:43:59.863 UTC [35] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-01 20:43:59.865 UTC [35] LOG:  redo starts at 144/4EFFFC18
...2019-05-01 20:44:02.389 UTC [35] LOG:  redo done at 144/74FFE060
2019-05-01 20:44:02.389 UTC [35] LOG:  last completed transaction was at log time 2019-04-28 05:05:24.687581+00
.2019-05-01 20:44:03.474 UTC [35] PANIC:  could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device
2019-05-01 20:44:03.480 UTC [34] LOG:  startup process (PID 35) was terminated by signal 6: Aborted
2019-05-01 20:44:03.480 UTC [34] LOG:  aborting startup due to startup process failure
2019-05-01 20:44:03.493 UTC [34] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.












--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

Re: Starting Postgres when there is no disk space

From
Igal Sapir
Date:
Thank you both.  The symlink sounds like a very good idea.  My other disk is 100GB and the database is already 130GB so moving the whole thing will require provisioning that will take more time.  I will try the symlinks first.  Possibly moving some tables to a tablespace on the other partition to make more room.

I have a scheduled process that runs daily to delete old data and do full vacuum.  Not sure why this happened (again).

Thanks,

Igal

On Wed, May 1, 2019 at 6:02 PM Michael Loftis <mloftis@wgops.com> wrote:
Best option....Copy/move the entire  pgdata to a larger space. It may also be enough to just move the WAL (leaving a symlink) freeing up the 623M but I doubt it since VACUUM FULL occurs in the same table space and can need an equal amount of space (130G) depending on how much it can actually free up.

You may also get away with just moving (and leaving a symlink) for the base but I don't recall if that works or not.

On Wed, May 1, 2019 at 18:07 Igal Sapir <igal@lucee.org> wrote:
I have Postgres running in a Docker container with PGDATA mounted from the host.  Postgres consume all of the disk space, 130GB [1], and can not be started [2].  The database has a lot of bloat due to much many deletions.  The problem is that now I can not start Postgres at all.

I mounted an additional partition with 100GB, hoping to fix the bloat with a TABLESPACE in the new mount, but how can I do anything if Postgres will not start in the first place?

I expected there to be a tool that can defrag the database files, e.g. a "vacuumdb" utility that can run without Postgres.  Or maybe run Postgres and disable the WAL so that no new disk space will be required.

Surely, I'm not the first one to experience this issue.  How can I fix this?

Thank you,

Igal

[1]
root@ff818ff7550a:/# du -h --max-depth=1 /pgdata
625M    /pgdata/pg_wal
608K    /pgdata/global
0       /pgdata/pg_commit_ts
0       /pgdata/pg_dynshmem
8.0K    /pgdata/pg_notify
0       /pgdata/pg_serial
0       /pgdata/pg_snapshots
16K     /pgdata/pg_subtrans
0       /pgdata/pg_twophase
16K     /pgdata/pg_multixact
130G    /pgdata/base
0       /pgdata/pg_replslot
0       /pgdata/pg_tblspc
0       /pgdata/pg_stat
0       /pgdata/pg_stat_tmp
7.9M    /pgdata/pg_xact
4.0K    /pgdata/pg_logical
0       /pgdata/tmp
130G    /pgdata

[2]
postgres@1efd26b999ca:/$ /usr/lib/postgresql/11/bin/pg_ctl start
waiting for server to start....2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv6 address "::", port 5432
2019-05-01 20:43:59.303 UTC [34] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-01 20:43:59.322 UTC [35] LOG:  database system shutdown was interrupted; last known up at 2019-05-01 19:37:32 UTC
2019-05-01 20:43:59.863 UTC [35] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-01 20:43:59.865 UTC [35] LOG:  redo starts at 144/4EFFFC18
...2019-05-01 20:44:02.389 UTC [35] LOG:  redo done at 144/74FFE060
2019-05-01 20:44:02.389 UTC [35] LOG:  last completed transaction was at log time 2019-04-28 05:05:24.687581+00
.2019-05-01 20:44:03.474 UTC [35] PANIC:  could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device
2019-05-01 20:44:03.480 UTC [34] LOG:  startup process (PID 35) was terminated by signal 6: Aborted
2019-05-01 20:44:03.480 UTC [34] LOG:  aborting startup due to startup process failure
2019-05-01 20:44:03.493 UTC [34] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.












--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

Re: Starting Postgres when there is no disk space

From
Ron
Date:
To get the cluster up and running, you only need to move a GB or two.

On 5/1/19 9:24 PM, Igal Sapir wrote:
Thank you both.  The symlink sounds like a very good idea.  My other disk is 100GB and the database is already 130GB so moving the whole thing will require provisioning that will take more time.  I will try the symlinks first.  Possibly moving some tables to a tablespace on the other partition to make more room.

I have a scheduled process that runs daily to delete old data and do full vacuum.  Not sure why this happened (again).

Thanks,

Igal

On Wed, May 1, 2019 at 6:02 PM Michael Loftis <mloftis@wgops.com> wrote:
Best option....Copy/move the entire  pgdata to a larger space. It may also be enough to just move the WAL (leaving a symlink) freeing up the 623M but I doubt it since VACUUM FULL occurs in the same table space and can need an equal amount of space (130G) depending on how much it can actually free up.

You may also get away with just moving (and leaving a symlink) for the base but I don't recall if that works or not.

On Wed, May 1, 2019 at 18:07 Igal Sapir <igal@lucee.org> wrote:
I have Postgres running in a Docker container with PGDATA mounted from the host.  Postgres consume all of the disk space, 130GB [1], and can not be started [2].  The database has a lot of bloat due to much many deletions.  The problem is that now I can not start Postgres at all.

I mounted an additional partition with 100GB, hoping to fix the bloat with a TABLESPACE in the new mount, but how can I do anything if Postgres will not start in the first place?

I expected there to be a tool that can defrag the database files, e.g. a "vacuumdb" utility that can run without Postgres.  Or maybe run Postgres and disable the WAL so that no new disk space will be required.

Surely, I'm not the first one to experience this issue.  How can I fix this?

Thank you,

Igal

[1]
root@ff818ff7550a:/# du -h --max-depth=1 /pgdata
625M    /pgdata/pg_wal
608K    /pgdata/global
0       /pgdata/pg_commit_ts
0       /pgdata/pg_dynshmem
8.0K    /pgdata/pg_notify
0       /pgdata/pg_serial
0       /pgdata/pg_snapshots
16K     /pgdata/pg_subtrans
0       /pgdata/pg_twophase
16K     /pgdata/pg_multixact
130G    /pgdata/base
0       /pgdata/pg_replslot
0       /pgdata/pg_tblspc
0       /pgdata/pg_stat
0       /pgdata/pg_stat_tmp
7.9M    /pgdata/pg_xact
4.0K    /pgdata/pg_logical
0       /pgdata/tmp
130G    /pgdata

[2]
postgres@1efd26b999ca:/$ /usr/lib/postgresql/11/bin/pg_ctl start
waiting for server to start....2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-05-01 20:43:59.301 UTC [34] LOG:  listening on IPv6 address "::", port 5432
2019-05-01 20:43:59.303 UTC [34] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-01 20:43:59.322 UTC [35] LOG:  database system shutdown was interrupted; last known up at 2019-05-01 19:37:32 UTC
2019-05-01 20:43:59.863 UTC [35] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-01 20:43:59.865 UTC [35] LOG:  redo starts at 144/4EFFFC18
...2019-05-01 20:44:02.389 UTC [35] LOG:  redo done at 144/74FFE060
2019-05-01 20:44:02.389 UTC [35] LOG:  last completed transaction was at log time 2019-04-28 05:05:24.687581+00
.2019-05-01 20:44:03.474 UTC [35] PANIC:  could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device
2019-05-01 20:44:03.480 UTC [34] LOG:  startup process (PID 35) was terminated by signal 6: Aborted
2019-05-01 20:44:03.480 UTC [34] LOG:  aborting startup due to startup process failure
2019-05-01 20:44:03.493 UTC [34] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.












--

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

--
Angular momentum makes the world go 'round.

Re: Starting Postgres when there is no disk space

From
Michael Nolan
Date:


Assuming you get the database back online, I would suggest you put a procedure in place to monitor disk space and alert you when it starts to get low. 
--
Mike Nolan

Re: Starting Postgres when there is no disk space

From
Igal Sapir
Date:
Right.  I managed to start up Postgres by symlinking the following directories to a new mount:  pg_logical, pg_subtrans, pg_wal, pg_xact.

I then created a new tablespace on the new mount, set it to be the default tablespace, and moved some of the smaller (about 30GB) tables to it.  That allowed me to delete old data, do full vacuum, and move the data back to the original disk.

This is timeseries data and there is a daily process that deletes the old stuff, but vacuum full still fails to return the space to the OS.  Perhaps I will get better results with table partitioning, or with TimeScaleDB.

Thank you for your help,

Igal

On Wed, May 1, 2019 at 11:08 PM Michael Nolan <htfoot@gmail.com> wrote:


Assuming you get the database back online, I would suggest you put a procedure in place to monitor disk space and alert you when it starts to get low. 
--
Mike Nolan

Re: Starting Postgres when there is no disk space

From
Jeff Janes
Date:
On Wed, May 1, 2019 at 10:25 PM Igal Sapir <igal@lucee.org> wrote:

I have a scheduled process that runs daily to delete old data and do full vacuum.  Not sure why this happened (again).

If you are doing a regularly scheduled "vacuum full", you are almost certainly doing something wrong.  Are these "vacuum full" completing, or are they failing (probably due to transient out of space errors)?

A ordinary non-full vacuum will make the space available for internal reuse. It will not return the space to filesystem (usually), so won't get you out of the problem.  But it should prevent you from getting into the problem in the first place.  If it is failing to reuse the space adequately, you should figure out why, rather than just blindly jumping to regularly scheduled "vacuum full".  For example, what is it that is bloating, the tables themselves, their indexes, or their TOAST tables?  Or is there any bloat in the first place? Are you sure your deletions are equal to your insertions, over the long term average?  If you are doing "vacuum full" and you are certain it is completing successfully, but it doesn't free up much space, then that is strong evidence that you don't actually have bloat, you just have more live data than you think you do.  (It could also mean you have done something silly with your "fillfactor" settings.)

If you don't want the space to be reused, to keep a high correlation between insert time and physical order of the rows for example, then you should look into partitioning, as you have already noted.

Now that you have the system up again and some space freed up, I'd create a "ballast" file with a few gig of random (to avoid filesystem-level compression, should you have such a filesystem) data on the same device that holds your main data, that can be deleted in an emergency to give you enough free space to at least start the system.  Of course, monitoring is also nice, but the ballast file is more robust and there is no reason you can't have both.

Cheers,

Jeff

Re: Starting Postgres when there is no disk space

From
Igal Sapir
Date:
Jeff,

On Fri, May 3, 2019 at 6:56 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, May 1, 2019 at 10:25 PM Igal Sapir <igal@lucee.org> wrote:

I have a scheduled process that runs daily to delete old data and do full vacuum.  Not sure why this happened (again).

If you are doing a regularly scheduled "vacuum full", you are almost certainly doing something wrong.  Are these "vacuum full" completing, or are they failing (probably due to transient out of space errors)?

A ordinary non-full vacuum will make the space available for internal reuse. It will not return the space to filesystem (usually), so won't get you out of the problem.  But it should prevent you from getting into the problem in the first place.  If it is failing to reuse the space adequately, you should figure out why, rather than just blindly jumping to regularly scheduled "vacuum full".  For example, what is it that is bloating, the tables themselves, their indexes, or their TOAST tables?  Or is there any bloat in the first place? Are you sure your deletions are equal to your insertions, over the long term average?  If you are doing "vacuum full" and you are certain it is completing successfully, but it doesn't free up much space, then that is strong evidence that you don't actually have bloat, you just have more live data than you think you do.  (It could also mean you have done something silly with your "fillfactor" settings.)

If you don't want the space to be reused, to keep a high correlation between insert time and physical order of the rows for example, then you should look into partitioning, as you have already noted.

Now that you have the system up again and some space freed up, I'd create a "ballast" file with a few gig of random (to avoid filesystem-level compression, should you have such a filesystem) data on the same device that holds your main data, that can be deleted in an emergency to give you enough free space to at least start the system.  Of course, monitoring is also nice, but the ballast file is more robust and there is no reason you can't have both.

Thank you for the tips.  I stand corrected.  These are regular VACUUM calls after the deletion, not VACUUM FULL.  It's a daily process that deletes records from N days ago, and then performs VACUUM, so yes, all of the inserted records should be deleted after N days.

The bloat is in a TOAST table.  The primary table has a JSONB column which can get quite large.  The fillfactor setting was not modified from its default value (does the primary table fillfactor affect the toast table?  either way they are both default in this case).

Ballast file is a great idea.  I was just thinking about that a couple of days ago, but instead of one file I think that I will have a bunch of them at 1GB each.  That will give me more flexibility in clearing space as needed and keeping more "safety buffers" for when I make space.

Thanks for your help,

Igal

Re: Starting Postgres when there is no disk space

From
Igal Sapir
Date:
If anyone ever needs, I wrote this 1-liner bash loop to create 16 temp files of 640MB random data each (well, 2-liner if you count the "config" line):

$ COUNT=16; TMPDIR=/pgdata/tmp/
$ for ((i=1; i<=6; i++)); do dd if=/dev/zero of="/pgdata/tmp/$(cat /dev/urandom | tr -cd 'a-f0-9' | head -c 20).tmp" count=81920 bs=8192; done;

Which produces about 10GB of unusable space that I can free up in the event that I run out of disk (10GB might be excessive, but it works for me for the time being):

$ ls -lh $TMPDIR
total 10G
-rw-r--r-- 1 root root 640M May  3 12:42 0a81845a5de0d926572e.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 1800a815773f34b8be98.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 1b182057d9b764d3b2a8.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 40f7b4cab222699d121a.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 498e9bc0852ed83af04f.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 49e84e5189e424c012be.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 7c984b156d11b5817aa5.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 7d1195b03906e3539495.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 9677ff969c7add0e7f92.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 9ae9d483adddf3317d7c.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 a546f3f363ca733427e7.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 a965856cb1118d98f66a.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 c162da7ecdb8824e3baf.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 d7c97019ce658b90285b.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 e76fc603ffe2c977c826.tmp
-rw-r--r-- 1 root root 640M May  3 12:42 fed72361b202f9492d7f.tmp


Best,

Igal

On Fri, May 3, 2019 at 9:09 AM Igal Sapir <igal@lucee.org> wrote:
Jeff,

On Fri, May 3, 2019 at 6:56 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, May 1, 2019 at 10:25 PM Igal Sapir <igal@lucee.org> wrote:

I have a scheduled process that runs daily to delete old data and do full vacuum.  Not sure why this happened (again).

If you are doing a regularly scheduled "vacuum full", you are almost certainly doing something wrong.  Are these "vacuum full" completing, or are they failing (probably due to transient out of space errors)?

A ordinary non-full vacuum will make the space available for internal reuse. It will not return the space to filesystem (usually), so won't get you out of the problem.  But it should prevent you from getting into the problem in the first place.  If it is failing to reuse the space adequately, you should figure out why, rather than just blindly jumping to regularly scheduled "vacuum full".  For example, what is it that is bloating, the tables themselves, their indexes, or their TOAST tables?  Or is there any bloat in the first place? Are you sure your deletions are equal to your insertions, over the long term average?  If you are doing "vacuum full" and you are certain it is completing successfully, but it doesn't free up much space, then that is strong evidence that you don't actually have bloat, you just have more live data than you think you do.  (It could also mean you have done something silly with your "fillfactor" settings.)

If you don't want the space to be reused, to keep a high correlation between insert time and physical order of the rows for example, then you should look into partitioning, as you have already noted.

Now that you have the system up again and some space freed up, I'd create a "ballast" file with a few gig of random (to avoid filesystem-level compression, should you have such a filesystem) data on the same device that holds your main data, that can be deleted in an emergency to give you enough free space to at least start the system.  Of course, monitoring is also nice, but the ballast file is more robust and there is no reason you can't have both.

Thank you for the tips.  I stand corrected.  These are regular VACUUM calls after the deletion, not VACUUM FULL.  It's a daily process that deletes records from N days ago, and then performs VACUUM, so yes, all of the inserted records should be deleted after N days.

The bloat is in a TOAST table.  The primary table has a JSONB column which can get quite large.  The fillfactor setting was not modified from its default value (does the primary table fillfactor affect the toast table?  either way they are both default in this case).

Ballast file is a great idea.  I was just thinking about that a couple of days ago, but instead of one file I think that I will have a bunch of them at 1GB each.  That will give me more flexibility in clearing space as needed and keeping more "safety buffers" for when I make space.

Thanks for your help,

Igal