Thread: temporary data after diskspace error
Hi,
We have a server with postgresql 9.4.12 on ubuntu.
There has been a sudden rise in the amount of disk space used by postgresql, causing a diskspace error:
2020-01-22 17:24:37 CET db: ip: us: PANIC: could not write to file "pg_xlog/xlogtemp.23346": No space left on device
2020-01-22 17:24:37 CET db: ip: us: LOG: WAL writer process (PID 23346) was terminated by signal 6: Aborted
2020-01-22 17:24:37 CET db: ip: us: LOG: terminating any other active server processes
2020-01-22 17:24:37 CET db: ip: us: LOG: terminating any other active server processes
The disk was at roughly 75% before and something or someone added 150 GB to the database, bringing the disk space usage to 100%.
The query that got the initial error was creating a rather large table, but it is not confirmed that this is the only source of the large-ish data amount. But it is possible.
Now i can see in pg_stat_database and postgresql/9.4/main/base/pgsql_tmp that there is 90GB of temporary files in the database.
Could the amount of temp files be caused by the unfinished query? I'm not sure how strong Signal 6 is exactly.
And also: How can i make postgres clean up the files?
Can it be done without restarting the cluster?
Will restarting it help?
--
Willy-Bas Loos
On 1/27/20 10:05 AM, Willy-Bas Loos wrote: > Hi, > > We have a server with postgresql 9.4.12 on ubuntu. > There has been a sudden rise in the amount of disk space used by > postgresql, causing a diskspace error: > > 2020-01-22 17:24:37 CET db: ip: us: PANIC: could not write to file > "pg_xlog/xlogtemp.23346": No space left on device > 2020-01-22 17:24:37 CET db: ip: us: LOG: WAL writer process (PID 23346) > was terminated by signal 6: Aborted > 2020-01-22 17:24:37 CET db: ip: us: LOG: terminating any other active > server processes > > The disk was at roughly 75% before and something or someone added 150 GB > to the database, bringing the disk space usage to 100%. > The query that got the initial error was creating a rather large table, > but it is not confirmed that this is the only source of the large-ish data > amount. But it is possible. > > Now i can see in pg_stat_database and postgresql/9.4/main/base/pgsql_tmp > that there is 90GB of temporary files in the database. > > Could the amount of temp files be caused by the unfinished query? I'm not > sure how strong Signal 6 is exactly. > > And also: How can i make postgres clean up the files? > Can it be done without restarting the cluster? > Will restarting it help? Restarting postgresql clears out pgsql_tmp. "pg_ctl restart -D ..." should be all you need. -- Angular momentum makes the world go 'round.
Willy-Bas Loos <willybas@gmail.com> writes: > And also: How can i make postgres clean up the files? > Can it be done without restarting the cluster? > Will restarting it help? A restart will clean out temp files. I don't think there's any terribly safe way to do it without that. You could manually remove such files that haven't been accessed recently, but the risk of human error is high. regards, tom lane
In terms of preventing this happening again, you might consider setting some reasonable temp_file_limit as default. Alternatively or additionally, you can set up another volume and direct temp files to use that to avoid the server crashing if excessive temp files are used in a moment's time. Then only that one query fails that used too much space.
It always bears mentioning that 9.4 is very old and upgrading is recommended to at least v11 that has been out for more than a year.
"PostgreSQL 9.4 will stop receiving fixes on February 13, 2020, which is the next planned cumulative update release."
Ok, thanks everyone!
Will there be a lot of downtime to delete those 90GB of temp files?
Will postgres just delete those files without processing them or should I brace for some downtime?
Op ma 27 jan. 2020 17:15 schreef Tom Lane <tgl@sss.pgh.pa.us>:
Willy-Bas Loos <willybas@gmail.com> writes:
> And also: How can i make postgres clean up the files?
> Can it be done without restarting the cluster?
> Will restarting it help?
A restart will clean out temp files. I don't think there's any
terribly safe way to do it without that. You could manually
remove such files that haven't been accessed recently, but the
risk of human error is high.
regards, tom lane
Willy-Bas Loos <willybas@gmail.com> writes: > Will there be a lot of downtime to delete those 90GB of temp files? > Will postgres just delete those files without processing them or should I > brace for some downtime? It's just a directory scan and an unlink() for each file that has the right filename pattern to be a temp file. If you've got a reasonable filesystem I wouldn't expect it to take long. Maybe a minute? (but don't quote me on that) regards, tom lane
I did the restart and it took seconds.
This was on a SSD.
BTW on ubuntu and debian i never use pg_ctl directly, postgresql-common has a very nice CLI for this.
For the restart i used: sudo pg_ctlcluster --force 9.4 main restart
Thanks for all the good advice!
On Mon, Jan 27, 2020 at 10:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Willy-Bas Loos <willybas@gmail.com> writes:
> Will there be a lot of downtime to delete those 90GB of temp files?
> Will postgres just delete those files without processing them or should I
> brace for some downtime?
It's just a directory scan and an unlink() for each file that has
the right filename pattern to be a temp file. If you've got a
reasonable filesystem I wouldn't expect it to take long.
Maybe a minute? (but don't quote me on that)
regards, tom lane
--
Willy-Bas Loos