Thread: Available disk space per tablespace
Hi, I'm picking up a 5 year old patch again: https://www.postgresql.org/message-id/flat/20191108132419.GG8017%40msg.df7cb.de Users will be interested in knowing how much extra data they can load into a database, but PG currently does not expose that number. This patch introduces a new function pg_tablespace_avail() that takes a tablespace name or oid, and returns the number of bytes "available" there. This is the number without any reserved blocks (Unix, f_avail) or available to the current user (Windows). (This is not meant to replace a full-fledged OS monitoring system that has much more numbers about disks and everything, it is filling a UX gap.) Compared to the last patch, this just returns a single number so it's easier to use - total space isn't all that interesting, we just return the number the user wants. The free space is included in \db+ output: postgres =# \db+ List of tablespaces Name │ Owner │ Location │ Access privileges │ Options │ Size │ Free │ Description ────────────┼───────┼──────────┼───────────────────┼─────────┼─────────┼────────┼───────────── pg_default │ myon │ │ ∅ │ ∅ │ 23 MB │ 538 GB │ ∅ pg_global │ myon │ │ ∅ │ ∅ │ 556 kB │ 538 GB │ ∅ spc │ myon │ /tmp/spc │ ∅ │ ∅ │ 0 bytes │ 31 GB │ ∅ (3 rows) The patch has also been tested on Windows. TODO: Figure out which systems need statfs() vs statvfs() Christoph
Attachment
On 2025/3/14 02:10, Christoph Berg wrote: > Hi, > > I'm picking up a 5 year old patch again: > https://www.postgresql.org/message-id/flat/20191108132419.GG8017%40msg.df7cb.de > > Users will be interested in knowing how much extra data they can load > into a database, but PG currently does not expose that number. This > patch introduces a new function pg_tablespace_avail() that takes a > tablespace name or oid, and returns the number of bytes "available" > there. This is the number without any reserved blocks (Unix, f_avail) > or available to the current user (Windows). > > (This is not meant to replace a full-fledged OS monitoring system that > has much more numbers about disks and everything, it is filling a UX > gap.) > > Compared to the last patch, this just returns a single number so it's > easier to use - total space isn't all that interesting, we just return > the number the user wants. > > The free space is included in \db+ output: > > postgres =# \db+ > List of tablespaces > Name │ Owner │ Location │ Access privileges │ Options │ Size │ Free │ Description > ────────────┼───────┼──────────┼───────────────────┼─────────┼─────────┼────────┼───────────── > pg_default │ myon │ │ ∅ │ ∅ │ 23 MB │ 538 GB │ ∅ > pg_global │ myon │ │ ∅ │ ∅ │ 556 kB │ 538 GB │ ∅ > spc │ myon │ /tmp/spc │ ∅ │ ∅ │ 0 bytes │ 31 GB │ ∅ > (3 rows) > > The patch has also been tested on Windows. > > TODO: Figure out which systems need statfs() vs statvfs() > I tested the patch under macos. Abnormal work: List of tablespaces Name | Owner | Location | Access privileges | Options | Size | Free | Description ------------+--------+----------+-------------------+---------+--------+-------+------------- pg_default | quanzl | | | | 23 MB |23 TB | pg_global | quanzl | | | | 556 kB | 23 TB | (2 rows) Actually my disk is 1TB. According to the statvfs documentation for macOS f_frsize The size in bytes of the minimum unit of allocation on this file system. f_bsize The preferred length of I/O requests for files on this file system. I tweaked the code a little bit. See the attachment. List of tablespaces Name | Owner | Location | Access privileges | Options | Size | Free | Description ------------+--------+----------+-------------------+---------+--------+--------+------------- pg_default | quanzl | | | | 22 MB | 116 GB | pg_global | quanzl | | | | 556 kB | 116 GB | (2 rows) In addition, many systems use 1000 as 1k to represent the storage size. Shouldn't we consider this factor as well? > Christoph
Attachment
Re: Quan Zongliang > According to the statvfs documentation for macOS > f_frsize The size in bytes of the minimum unit of allocation on this > file system. > f_bsize The preferred length of I/O requests for files on this file > system. Thanks for catching that. f_frsize is the correct field to use. The statvfs(3) manpage on Linux has it as well, but it's less pronounced there so I missed it: struct statvfs { unsigned long f_bsize; /* Filesystem block size */ unsigned long f_frsize; /* Fragment size */ fsblkcnt_t f_blocks; /* Size of fs in f_frsize units */ fsblkcnt_t f_bfree; /* Number of free blocks */ fsblkcnt_t f_bavail; /* Number of free blocks for unprivileged users */ > In addition, many systems use 1000 as 1k to represent the storage size. > Shouldn't we consider this factor as well? That would be a different pg_size_pretty() function, unrelated to this patch. I'm still unconvinced if we should use statfs() instead of statvfs() on *BSD or if their manpage is just trolling us and statvfs is just fine. DESCRIPTION The statvfs() and fstatvfs() functions fill the structure pointed to by buf with garbage. This garbage will occasionally bear resemblance to file system statistics, but portable applications must not depend on this. Christoph
Attachment
On Sat, Mar 15, 2025 at 4:40 AM Christoph Berg <myon@debian.org> wrote: > I'm still unconvinced if we should use statfs() instead of statvfs() > on *BSD or if their manpage is just trolling us and statvfs is just > fine. > > DESCRIPTION > The statvfs() and fstatvfs() functions fill the structure pointed to by > buf with garbage. This garbage will occasionally bear resemblance to > file system statistics, but portable applications must not depend on > this. Hah, I see this in my local FreeBSD man page. I guess this might be a reference to POSIX's 100% get-out clause "it is unspecified whether all members of the statvfs structure have meaningful values on all file systems". The statfs() man page doesn't say that (a nonstandard syscall that originated in 4.4BSD, which POSIX decided to rename because other systems sprouted incompatible statfs() interfaces?). It's hard to imagine a system that doesn't track free space and report it here, and if it doesn't, well so what, that's probably also a system that can't report free space to the "df" command, so what are we supposed to do? We could perhaps add a note to the documentation that this field relies on the OS providing meaningful "avail" field in statvfs(), but it's hard to imagine. Maybe just defer that until someone shows up with a real report? So +1 from me, go for it, call statvfs() and don't worry. I tried your v3 patch on my FreeBSD 14.2 battle station: postgres=# \db+ List of tablespaces Name | Owner | Location | Access privileges | Options | Size | Free | Description ------------+--------+----------+-------------------+---------+--------+--------+------------- pg_default | tmunro | | | | 22 MB | 290 GB | pg_global | tmunro | | | | 556 kB | 290 GB | That is the correct answer: tmunro@build1:~/projects/postgresql/build $ df -h . Filesystem Size Used Avail Capacity Mounted on zroot/usr/home 331G 41G 290G 12% /usr/home I also pushed your patch to CI and triggered the NetBSD and OpenBSD tasks and they passed your sanity test, though that only checks that the reported some number > 1MB. I looked at the source, and on FreeBSD statvfs[1] is just a libc function that calls statfs() (as does df). The statfs() man page has no funny disclaimers. OpenBSD's[2] too. NetBSD seems to have a real statvfs (or statvfs1) syscall but its man page has no funny disclaimers. +#ifdef WIN32 + if (GetDiskFreeSpaceEx(tblspcPath, &lpFreeBytesAvailable, NULL, NULL) == false) + return -1; + + return lpFreeBytesAvailable.QuadPart; /* ULONGLONG part of ULARGE_INTEGER */ +#else + if (statvfs(tblspcPath, &fst) < 0) + return -1; + + return fst.f_bavail * fst.f_frsize; /* available blocks times fragment size */ +#endif What's the rationale for not raising an error if the system call fails? If someone complains that it's showing -1, doesn't that mean we'll have to ask them to trace the system calls to figure out why, or if it's Windows, likely abandon all hope of ever knowing why? Should statvfs() retry on EINTR? Style nit: maybe ! instead of == false? Nice feature. [1] https://github.com/freebsd/freebsd-src/blob/36782aaba4f1a7d054aa405357a8fa2bc0f94eb0/lib/libc/gen/statvfs.c#L70 [2] https://github.com/openbsd/src/blob/70ab9842eb8b368612eb098db19dcf94c19d673d/lib/libc/gen/statvfs.c#L59
Re: Thomas Munro > Hah, I see this in my local FreeBSD man page. I guess this might be a > reference to POSIX's 100% get-out clause "it is unspecified whether > all members of the statvfs structure have meaningful values on all > file systems". Yeah I could hear someone being annoyed by POSIX echoed in that paragraph. > system that can't report free space to the "df" command, so what are > we supposed to do? We could perhaps add a note to the documentation > that this field relies on the OS providing meaningful "avail" field in > statvfs(), but it's hard to imagine. Maybe just defer that until > someone shows up with a real report? So +1 from me, go for it, call > statvfs() and don't worry. I was reading looking into gnulib's wrapper around this - it's also basically calling statvfs() except on assorted older systems. https://github.com/coreutils/gnulib/blob/master/lib/fsusage.c#L114 Do we care about any of these? AIX OSF/1 2.6 < glibc/Linux < 2.6.36 glibc/Linux < 2.6, 4.3BSD, SunOS 4, \ Mac OS X < 10.4, FreeBSD < 5.0, \ NetBSD < 3.0, OpenBSD < 4.4k SunOS 4.1.2, 4.1.3, and 4.1.3_U1 4.4BSD and older NetBSD SVR3, old Irix If not, then statvfs seems safe. > I also pushed your patch to CI and triggered the NetBSD and OpenBSD > tasks and they passed your sanity test, though that only checks that > the reported some number > 1MB. I thought about making that test "between 1MB and 10PB", but that seemed silly - it's not testing much, and some day, someone will try to run the test on a system where it will still fail. > What's the rationale for not raising an error if the system call > fails? That's mirroring the behavior of calculate_tablespace_size() in the same file. I thought that's to allow \db+ to succeed even if some of the tablespaces are botched/missing/whatever. But now on closer inspection, I see that db_dir_size() is erroring out on problems, it just ignores the top-level directory missing. Fixed in the attached patch. \db+ FEHLER: XX000: could not statvfs directory "pg_tblspc/16384/PG_18_202503111": Zu viele Ebenen aus symbolischen Links LOCATION: calculate_tablespace_avail, dbsize.c:373 But this is actually something I wanted to address in a follow-up patch: Currently, non-superusers cannot run \db+ because they lack CREATE on pg_global (but `\db+ pg_default` works). Should we rather make pg_database_size and pg_database_avail return NULL for insufficient permissions instead of throwing an error? > If someone complains that it's showing -1, doesn't that mean (-1 is translated to NULL for the SQL level.) > we'll have to ask them to trace the system calls to figure out why, or > if it's Windows, likely abandon all hope of ever knowing why? Should > statvfs() retry on EINTR? Hmm. Is looping on EINTR worth the trouble? > Style nit: maybe ! instead of == false? Changed. > Nice feature. Thanks! Christoph
Attachment
On Sat, 2025-03-15 at 13:09 +0100, Christoph Berg wrote: > Do we care about any of these? > > AIX We dropped support for it, but there are efforts to change that. Yours, Laurenz Albe
On Sun, Mar 16, 2025 at 1:17 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > On Sat, 2025-03-15 at 13:09 +0100, Christoph Berg wrote: > > Do we care about any of these? > > > > AIX > > We dropped support for it, but there are efforts to change that. FWIW AIX does have it, according to its manual, in case it comes back. The others in the list are defunct or obsolete versions.
On Sun, Mar 16, 2025 at 1:09 AM Christoph Berg <myon@debian.org> wrote: > Hmm. Is looping on EINTR worth the trouble? I was just wondering if it might be one of those oddballs that ignores SA_RESTART, but I guess that doesn't seem too likely (I mean, first you'd probably have to have a reason to sleep or some other special reason, and who knows what some unusual file systems might do). It certainly doesn't on the systems I tried. So I guess not until we have other evidence.
Re: Thomas Munro > > Hmm. Is looping on EINTR worth the trouble? > > I was just wondering if it might be one of those oddballs that ignores > SA_RESTART, but I guess that doesn't seem too likely (I mean, first > you'd probably have to have a reason to sleep or some other special > reason, and who knows what some unusual file systems might do). It > certainly doesn't on the systems I tried. So I guess not until we > have other evidence. Gnulib's get_fs_usage() (which is what GNU coreutil's df uses) does not handle EINTR either. There is some code that does int width expansion, but I believe we don't need that since the `fst.f_bavail * fst.f_frsize` multiplication takes care of converting that to int64 (if it wasn't already 64bits before). Christoph