Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers
From | Michael Harris |
---|---|
Subject | Re: FileFallocate misbehaving on XFS |
Date | |
Msg-id | CADofcAWphm3uMtXZVCwko15E47HVhksR5YZ2pWhUpEjNz6Hbmw@mail.gmail.com Whole thread Raw |
In response to | Re: FileFallocate misbehaving on XFS (Andres Freund <andres@anarazel.de>) |
Responses |
Re: FileFallocate misbehaving on XFS
Re: FileFallocate misbehaving on XFS |
List | pgsql-hackers |
Hi Andres Following up on the earlier question about OS upgrade paths - all the cases reported so far are either on RL8 (Kernel 4.18.0) or were upgraded to RL9 (kernel 5.14.0) and the affected filesystems were preserved. In fact the RL9 systems were initially built as Centos 7, and then when that went EOL they were upgraded to RL9. The process was as I described - the /var/opt filesystem which contained the database was preserved, and the root and other OS filesystems were scratched. The majority of systems where we have this problem are on RL8. On Tue, 10 Dec 2024 at 11:31, Andres Freund <andres@anarazel.de> wrote: > Are you using any filesystem quotas? No. > It'd be useful to get the xfs_info output that Jakub asked for. Perhaps also > xfs_spaceman -c 'freesp -s' /mountpoint > xfs_spaceman -c 'health' /mountpoint > and df. I gathered this info from one of the systems that is currently on RL9. This system is relatively small compared to some of the others that have exhibited this issue, but it is the only one I can access right now. # uname -a Linux 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux # xfs_info /dev/mapper/ippvg-ipplv meta-data=/dev/mapper/ippvg-ipplv isize=512 agcount=4, agsize=262471424 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0, rmapbt=0 = reflink=0 bigtime=0 inobtcount=0 nrext64=0 data = bsize=4096 blocks=1049885696, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=512639, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # for agno in `seq 0 3`; do xfs_spaceman -c "freesp -s -a $agno" /var/opt; done from to extents blocks pct 1 1 37502 37502 0.15 2 3 62647 148377 0.59 4 7 87793 465950 1.85 8 15 135529 1527172 6.08 16 31 184811 3937459 15.67 32 63 165979 7330339 29.16 64 127 101674 8705691 34.64 128 255 15123 2674030 10.64 256 511 973 307655 1.22 total free extents 792031 total free blocks 25134175 average free extent size 31.7338 from to extents blocks pct 1 1 43895 43895 0.22 2 3 59312 141693 0.70 4 7 83406 443964 2.20 8 15 120804 1362108 6.75 16 31 133140 2824317 14.00 32 63 118619 5188474 25.71 64 127 77960 6751764 33.46 128 255 16383 2876626 14.26 256 511 1763 546506 2.71 total free extents 655282 total free blocks 20179347 average free extent size 30.7949 from to extents blocks pct 1 1 72034 72034 0.26 2 3 98158 232135 0.83 4 7 126228 666187 2.38 8 15 169602 1893007 6.77 16 31 180286 3818527 13.65 32 63 164529 7276833 26.01 64 127 109687 9505160 33.97 128 255 22113 3921162 14.02 256 511 1901 592052 2.12 total free extents 944538 total free blocks 27977097 average free extent size 29.6199 from to extents blocks pct 1 1 51462 51462 0.21 2 3 98993 233204 0.93 4 7 131578 697655 2.79 8 15 178151 1993062 7.97 16 31 175718 3680535 14.72 32 63 145310 6372468 25.48 64 127 89518 7749021 30.99 128 255 18926 3415768 13.66 256 511 2640 813586 3.25 total free extents 892296 total free blocks 25006761 average free extent size 28.0252 # xfs_spaceman -c 'health' /var/opt Health status has not been collected for this filesystem. > What kind of storage is this on? As mentioned, there are quite a few systems in different sites, so a number of different storage solutions in use, some with directly attached disks, others with some SAN solutions. The instance I got the printout above from is a VM, but in the other site they are all bare metal. > Was the filesystem ever grown from a smaller size? I can't say for sure that none of them were, but given the number of different systems that have this issue I am confident that would not be a common factor. > Have you checked the filesystem's internal consistency? I.e. something like > xfs_repair -n /dev/nvme2n1. It does require the filesystem to be read-only or > unmounted though. But corrupted filesystem datastructures certainly could > cause spurious ENOSPC. I executed this on the same system as the above prints came from. It did not report any issues. > Are you using pg_upgrade -j? Yes, we use -j `nproc` > I assume the file that actually errors out changes over time? It's always > fallocate() that fails? Yes, correct, on both counts. > Can you tell us anything about the workload / data? Lots of tiny tables, lots > of big tables, write heavy, ...? It is a write heavy application which stores mostly time series data. The time series data is partitioned by time; the application writes constantly into the 'current' partition, and data is expired by removing the oldest partition. Most of the data is written once and not updated. There are quite a lot of these partitioned tables (in the 1000's or 10000's) depending on how the application is configured. Individual partitions range in size from a few MB to 10s of GB. Cheers Mike.
pgsql-hackers by date: