Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers
From | Michael Harris |
---|---|
Subject | Re: FileFallocate misbehaving on XFS |
Date | |
Msg-id | CADofcAUOqdrEhZj6-3h3GKz2k7J1pJe4pQ0W-PEibOj2=vrScA@mail.gmail.com Whole thread Raw |
In response to | Re: FileFallocate misbehaving on XFS (Michael Harris <harmic@gmail.com>) |
List | pgsql-hackers |
Hi again One extra piece of information: I had said that all the machines were Rocky Linux 8 or Rocky Linux 9, but actually a large number of them are RHEL8. Sorry for the confusion. Of course RL8 is a rebuild of RHEL8 so it is not surprising they would be behaving similarly. Cheers Mike On Tue, 10 Dec 2024 at 17:28, Michael Harris <harmic@gmail.com> wrote: > > Hi Andres > > Following up on the earlier question about OS upgrade paths - all the > cases reported so far are either on RL8 (Kernel 4.18.0) or were > upgraded to RL9 (kernel 5.14.0) and the affected filesystems were > preserved. > In fact the RL9 systems were initially built as Centos 7, and then > when that went EOL they were upgraded to RL9. The process was as I > described - the /var/opt filesystem which contained the database was > preserved, and the root and other OS filesystems were scratched. > The majority of systems where we have this problem are on RL8. > > On Tue, 10 Dec 2024 at 11:31, Andres Freund <andres@anarazel.de> wrote: > > Are you using any filesystem quotas? > > No. > > > It'd be useful to get the xfs_info output that Jakub asked for. Perhaps also > > xfs_spaceman -c 'freesp -s' /mountpoint > > xfs_spaceman -c 'health' /mountpoint > > and df. > > I gathered this info from one of the systems that is currently on RL9. > This system is relatively small compared to some of the others that > have exhibited this issue, but it is the only one I can access right > now. > > # uname -a > Linux 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 > 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux > > # xfs_info /dev/mapper/ippvg-ipplv > meta-data=/dev/mapper/ippvg-ipplv isize=512 agcount=4, agsize=262471424 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=0, sparse=0, rmapbt=0 > = reflink=0 bigtime=0 inobtcount=0 nrext64=0 > data = bsize=4096 blocks=1049885696, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=512639, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > # for agno in `seq 0 3`; do xfs_spaceman -c "freesp -s -a $agno" /var/opt; done > from to extents blocks pct > 1 1 37502 37502 0.15 > 2 3 62647 148377 0.59 > 4 7 87793 465950 1.85 > 8 15 135529 1527172 6.08 > 16 31 184811 3937459 15.67 > 32 63 165979 7330339 29.16 > 64 127 101674 8705691 34.64 > 128 255 15123 2674030 10.64 > 256 511 973 307655 1.22 > total free extents 792031 > total free blocks 25134175 > average free extent size 31.7338 > from to extents blocks pct > 1 1 43895 43895 0.22 > 2 3 59312 141693 0.70 > 4 7 83406 443964 2.20 > 8 15 120804 1362108 6.75 > 16 31 133140 2824317 14.00 > 32 63 118619 5188474 25.71 > 64 127 77960 6751764 33.46 > 128 255 16383 2876626 14.26 > 256 511 1763 546506 2.71 > total free extents 655282 > total free blocks 20179347 > average free extent size 30.7949 > from to extents blocks pct > 1 1 72034 72034 0.26 > 2 3 98158 232135 0.83 > 4 7 126228 666187 2.38 > 8 15 169602 1893007 6.77 > 16 31 180286 3818527 13.65 > 32 63 164529 7276833 26.01 > 64 127 109687 9505160 33.97 > 128 255 22113 3921162 14.02 > 256 511 1901 592052 2.12 > total free extents 944538 > total free blocks 27977097 > average free extent size 29.6199 > from to extents blocks pct > 1 1 51462 51462 0.21 > 2 3 98993 233204 0.93 > 4 7 131578 697655 2.79 > 8 15 178151 1993062 7.97 > 16 31 175718 3680535 14.72 > 32 63 145310 6372468 25.48 > 64 127 89518 7749021 30.99 > 128 255 18926 3415768 13.66 > 256 511 2640 813586 3.25 > total free extents 892296 > total free blocks 25006761 > average free extent size 28.0252 > > # xfs_spaceman -c 'health' /var/opt > Health status has not been collected for this filesystem. > > > What kind of storage is this on? > > As mentioned, there are quite a few systems in different sites, so a > number of different storage solutions in use, some with directly > attached disks, others with some SAN solutions. > The instance I got the printout above from is a VM, but in the other > site they are all bare metal. > > > Was the filesystem ever grown from a smaller size? > > I can't say for sure that none of them were, but given the number of > different systems that have this issue I am confident that would not > be a common factor. > > > Have you checked the filesystem's internal consistency? I.e. something like > > xfs_repair -n /dev/nvme2n1. It does require the filesystem to be read-only or > > unmounted though. But corrupted filesystem datastructures certainly could > > cause spurious ENOSPC. > > I executed this on the same system as the above prints came from. It > did not report any issues. > > > Are you using pg_upgrade -j? > > Yes, we use -j `nproc` > > > I assume the file that actually errors out changes over time? It's always > > fallocate() that fails? > > Yes, correct, on both counts. > > > Can you tell us anything about the workload / data? Lots of tiny tables, lots > > of big tables, write heavy, ...? > > It is a write heavy application which stores mostly time series data. > > The time series data is partitioned by time; the application writes > constantly into the 'current' partition, and data is expired by > removing the oldest partition. Most of the data is written once and > not updated. > > There are quite a lot of these partitioned tables (in the 1000's or > 10000's) depending on how the application is configured. Individual > partitions range in size from a few MB to 10s of GB. > > Cheers > Mike.
pgsql-hackers by date: